ID |
Date |
Author |
Type |
Category |
Subject |
9007
|
Tue Aug 13 17:20:54 2013 |
Koji | Update | CDS | [Fixed] c1iscex needs help |
c1x01 timing issue was solved. Now all of the models on c1iscex are nicely running.
Symptons
- c1x01 was synchronized to 1PPS in stead of TDS
- C1:DAQ-DC0_C1X01_STATUS (Upper right indicator) was red. The bits were 0x4000 or 0x2bad.
C1:DAQ-DC0_C1X01_CRC_SUM kept increasing
- c1scx, c1spx, c1asx could not get started.
Solution
- login to c1iscex "ssh c1iscex "
- Run "sudo shutdown -h now "
- Walk down to the x end rack
- Make sure the supply voltages for the electronics are correct (See Steve's entry)
- Make sure the machine is already shutdown.
- Unplug two AC power supply of the machine.
- Turn off the front panel switch of the IO chassis
- Wait for 10sec
- Turn on the IO chassis
- Plug the AC power supply cables to the machine
- Push the power switch of the realtime machine |
9020
|
Fri Aug 16 21:15:04 2013 |
rana | Update | CDS | New/old CDS laptop for X-End |
I took the "aso-laptop" and made it into Ubuntu a couple months ago. Today I added it to the Martian network and then moved it to the X End.
I followed the instructions in (https://wiki-40m.ligo.caltech.edu/Network) and added it to the files in /var/named/chroot/var/named on linux1 and did the "service named restart".
The router already had his MAC address in its list (because Yoichi was illegally using his personal laptop on the Martian). The new laptop's name is 'asia'. This is a legal name according to our computer naming conventions and this Wikipedia page (http://en.wiktionary.org/wiki/Category:Italian_female_given_names). It has been added to the Name Pool on the wiki.
The terminal on the laptop still calls itself 'aso-laptop' so I need some help in fixing that. It successfully connects to 40MARS and displays a MEDM sitemap after sshing in to pianosa.
I use 'ssh -X -C' since I find that compression actually helps when the laptops are so far from the router. |
9021
|
Sun Aug 18 16:04:07 2013 |
rana | Summary | CDS | FB lights all RED: mxstream restart |
Sun Aug 18 15:52:50 2013
Found the FB lights (C1:FEC-NN_FB_NET_STATUS and C1:DAQ-DC0_C1XXX_STATUS) RED for everything on the CDS_FE_STATUS screen.
I used the (! mxstream restart) button ro restart the mxstreams. Everything is green now.
PMC was out of lock- relocked it and the IMC locked itself as did the X & Y arms on IR. X was already green locked. |
Attachment 1: IFO-Trend.png
|
|
9022
|
Sun Aug 18 17:56:16 2013 |
rana | Summary | CDS | MEDM Screen CPU Usages |
I noticed at LLO (?) that the LSC screen there uses up ~25-30% of the CPU time on a single core for the control room iMac workstations - this seems excessive.
Here is an accounting of CPU usage percentages for some of our screens:
Screen Name |
CPU (%) |
LSC_OVERVIEW |
7 |
ALS_OVERVIEW |
0 |
ALS |
1 |
SUS_SUMMARY |
0 |
IOO_WFS_MASTER |
0.3 |
OPLEV_MASTER |
0.5 |
These were measured using the program 'glances' on rosalba. MEDM running with only the sitemap used up 0.9% of a CPU. With the screens running, the fluctuation from sample to sample could be ~ +/- 0.5%. While the LSC screen seems to be the biggest pig, it is only big in comparison to small pigs. Certainly this pig has gotten bigger after getting sent to Louisiana. |
Attachment 1: obama1404_666531c.jpg
|
|
9066
|
Mon Aug 26 19:54:15 2013 |
manasa | Update | CDS | c1als model modified |
I had made changes to the c1als model a couple of weeks ago. I had removed all the beat_coarse channels that had existed from pre-phase tracker era.
Also, I forgot to elog about it then. This dawned on me only when I found that c1als isn't working the way it should right now.

|
9070
|
Tue Aug 27 15:44:08 2013 |
manasa | Update | CDS | Issues with ALS fixed |
I figured out the problem with ALS from yesterday. While the model was just fine, the medm screens were not checked if they were reading the correct channel names.
The channel names of the ALS input matrix elements had changed when the coarse channels were deleted from the c1als model. So the error signals were not reaching the servo modules as expected. This is why I was not able to make sense as to what the ALS was doing.
All is fixed now and should be back to normal  |
9074
|
Tue Aug 27 19:34:36 2013 |
Jamie | Configuration | CDS | front end IPC configuration |
So the IPC situation on the front end network is not so great right now. For various no-longer-valid reasons, c1lsc had no RFM card, all the IPC connections were routed through the c1rfm model on c1sus, and routed to c1lsc via dolphin PCIe as needed. As things grew, c1rfm became overloaded. Koji tried to fix the situation by breaking things out of c1rfm to make direct connections where we could. This cleared up c1rfm a bit, but not c1mcs is overloading.
Reminder: PCIe (dolphin) is faster and higher bandwidth than RFM. The more things we can put on PCIe the better.
Attached is a graph of my rough accounting of the intended direct IPC connections between the front ends. By "intended direct" I mean what should be direct connections if we had all the appropriate hardware. Right now the actual connection graph is more convoluted than this since things are passing through c1rfm. I note this graph was NOT particularly easy to make, which is very unfortunate. I had to manually look through every model and determine the ultimate source of every incoming IPC. Kind of a pain in the butt. It would be nice if there was a simple way to represent this.
Here are some various solutions to the problem as I see it:
a) put c1lsc on the RFM network
This would allow c1lsc to talk to c1ioo, c1iscex, and c1iscey without having to go through c1sus, thereby eliminating c1rfm altogether. I'm not sure why we didn't just do this originally.
Requires:
b) put c1ioo on the PCIe network (and move c1sus's RFM card to c1lsc)
This is probably the most robust solution.
b1) There are roughly 8 IPCs going from c1ioo to c1sus, and 4 going the other way, and 3 IPCs from c1ioo to c1lsc. If we put c1ioo on PCIe all of these now RFM connections would become direct PCIe connections, which would be a big win.
At this point only the end station front ends would be on RFM, and most of the connections to those come from c1lsc, so it would make sense to give c1lsc the RFM card, thereby eliminating a lot of stuff from c1rfm.
Requires:
- dolphin card for c1ioo (do the old sun machines support these? if they don't we could swap the old sun machine with a new spare aLIGO-approved supermicro machines, which we have spares of)
- dolphin fibre to go to dolphin switch in 1X3 rack
b2) OR, we could move c1ioo to 1X4 with c1lsc and c1sus, and get a OneStop fibre cable to connect to its IO chassis. We would still need a dolphin card, but we could use coper instead of fibre. This is my preferred solution, since it moves c1ioo out of 1X1, where it's really in the way and making a lot of noise. It would also be easier to manage all the machines if they're together in one rack.
Requires:
- dolphin card for c1ioo
- dolphin coper cable for c1ioo
- OneStop fibre for c1ioo
c) put another cpu in c1sus
c1sus is (I believe) able to support another 6-core cpu. If we added more cores to c1sus, we could break up c1rfm into c1rfm0, c1rfm1, etc. This is a less elegant solution imho, but it would probably do the job.
Requires:
|
Attachment 1: hosts.png
|
|
9076
|
Tue Aug 27 20:43:34 2013 |
Koji | Configuration | CDS | front end IPC configuration |
The reason we had the PCIe/RFM system was to test this mixed configuration in prior to the actual implementation at the sites.
Has this configuration been intesively tested at the site with practical configuration?
Quote: |
Attached is a graph of my rough accounting of the intended direct IPC connections between the front ends.
|
It's hard to believe that c1lsc -> c1sus only has 4 channels. We actuate ITMX/Y/BS/PRM/SRM for the length control.
In addition to these, we control the angles of ITMX/Y/BS/PRM (and SRM in future) via c1ass model on c1lsc.
So there should be at least 12 connections (and more as I ignored MCL).
I personally prefers to give the PCIe card to c1ioo and move the RFM card to c1lsc.
But in either cases, we want to quantitatively compare what the current configuration is (not omitting the bridging by c1rfm),
and what the future configuration will be including the addtional channels we want add in close future,
because RFM connections are really costly and moving the RFM card to c1lsc may newly cause the timeout of c1lsc
just instead of c1sus. |
9077
|
Wed Aug 28 00:41:23 2013 |
Jenne | Update | CDS | CDS svn commits not happening |
svn status update. asx, als and ioo were found not committed. Not sure about who modified ioo last after Jenne.
//edit Manasa - edited the/ elog instead of replying // |
9079
|
Wed Aug 28 05:21:58 2013 |
manasa | Update | CDS | CDS svn commits not happening |
I am responsible for missed svn commits with als and asx. I have committed them.
But I have not modified anything with ioo in the last few weeks.
|
9086
|
Wed Aug 28 19:47:28 2013 |
jamie | Configuration | CDS | front end IPC configuration |
Quote: |
It's hard to believe that c1lsc -> c1sus only has 4 channels. We actuate ITMX/Y/BS/PRM/SRM for the length control.
In addition to these, we control the angles of ITMX/Y/BS/PRM (and SRM in future) via c1ass model on c1lsc.
So there should be at least 12 connections (and more as I ignored MCL).
|
Koji was correct that I missed some connections from c1lsc to c1sus. I corrected the graph in the original post.
Also, I should have noted, that that graph doesn't actually include everything that we now have. I left out all the simplant stuff, which adds extra connections between c1lsc and c1sus, mostly because the sus simplant is being run on c1lsc only because there was no space on c1sus. That should be corrected, either by moving c1rfm to c1lsc, or by adding a new core to c1sus.
I also spoke to Rolf today and about the possibility of getting a OneStop fiber and dolphin card for c1ioo. The dolphin card and cable we should be able to order no problem. As for the OneStop, we might have to borrow a new fiber-supporting card from India, then send our current card to OneStop for fiber-supporting modifications. It sounds kind of tricky. I'll post more as I figure things out.
Rolf also said that in newer versions of the RCG, the RFM direct memory access (DMA) has improved in performance considerably, which reduces considerably the model run-time delay involved in using the RFM. In other words, the long awaited RCG upgrade might alleviate some of our IPC woes.
We need to upgrade the RCG to the latest release (2.7) |
9087
|
Wed Aug 28 23:09:55 2013 |
jamie | Configuration | CDS | code to generate host IPC graph |
|
Attachment 1: hosts.png
|
|
Attachment 2: 40m-ipcs-graph.py
|
#!/usr/bin/env python
# ipc connections: (from, to, number)
ipcs = [
('c1scx', 'c1lsc', 1),
('c1scy', 'c1lsc', 1),
('c1oaf', 'c1lsc', 8),
('c1scx', 'c1ass', 1),
('c1scy', 'c1ass', 1),
... 96 more lines ...
|
9137
|
Wed Sep 18 11:29:43 2013 |
manasa | Update | CDS | Dataviewer cannot connect to fb |
Masayuki pointed out that dataviewer wasn't connecting to the fb this morning.
When I started dataviewer from the terminal I obtained the following error:
controls@pianosa:~ 0$ dataviewer
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Error in obtaining chan info.
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
I checked the CDS FE status screen and it looks normal. I could ping the fb and ssh to it as well.
I restarted fb to see if it made any difference. telnet fb 8088
It hasn't helped. Anything else that can be done??

|
9138
|
Wed Sep 18 11:52:53 2013 |
Jamie | Update | CDS | Dataviewer cannot connect to fb |
Quote: |
Masayuki pointed out that dataviewer wasn't connecting to the fb this morning.
When I started dataviewer from the terminal I obtained the following error:
controls@pianosa:~ 0$ dataviewer
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Error in obtaining chan info.
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
I checked the CDS FE status screen and it looks normal. I could ping the fb and ssh to it as well.
I restarted fb to see if it made any difference. telnet fb 8088
It hasn't helped. Anything else that can be done??
|
I've fixed the problem. This was due to a change I made in the NDSSERVER environment variable so that it would work with cdsutils. I didn't realize there was an incompatibility with how dataviewer parses NDSSERVER. Joe and I will have to figure it out.
In the mean time I've changed things back so that that dataviewer should now work as expected. You might have to log out and back in for it to work (or at least open a new terminal). |
9182
|
Tue Oct 1 14:12:22 2013 |
rana | Summary | CDS | svndumpfilter on linux1 makes NFS slow |
Yesterday and this morning's slow NFS disk access was caused by 'svndumpfilter' being run at linux1 to carve out the Noise Budget directory. It is being moved to another server; I think the disk access is back to normal speed now. |
9184
|
Tue Oct 1 19:42:19 2013 |
rana | Summary | CDS | megatron upgrade |
Max and I started upgrading megatron to Ubuntu 12.NN today. We were having some troubles with getting latest python code to run to support the Summary pages stuff.
Its also a nice test to see what CDS tools fail on there, before we upgrade the workstations to Ubuntu 12.
Since its Linux, none of the usual upgrading commands worked, but after an hour or so of reading forums we were able to delete some packages and all the 3rd party packages and get the upgrade to go ahead. We'll have to re-install the LSC, GDS, LAL repos to get it back into shape and get NDS2 working. The upgrade is running in a 'screen' command on there.
Wed Oct 02 14:50:16 2013
Update #1: The upgrade asks a couple dozen questions so it doesn't proceed by itself. I've been checking in to the 'screen' every couple hours to type in 'Yes' to let it keep going.
Update #2: It finished a few hours ago:
controls@megatron:~ 0$ uname -a
Linux megatron 3.2.0-54-generic #82-Ubuntu SMP Tue Sep 10 20:08:42 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
controls@megatron:~ 0$ date
Wed Oct 2 18:33:41 PDT 2013
|
9259
|
Tue Oct 22 18:55:55 2013 |
rana | Update | CDS | Workstation swap: Rosalba to ??? |
We got a new computer from Xi computer corp. I am currently installing Ubuntu 10.04 LTS on to it to start with and then will move on to 12 if we can figure out a way to test it besides "I guess it should work?"
Rosalba has been removed and put onto the old Jamie desk. Old Jamie desk also has a Mac Mini running on there.
At the meeting tomorrow we need to decide on a new Italian baby girl name for this new machine. |
9271
|
Wed Oct 23 22:11:20 2013 |
rana | Update | CDS | Workstation swap: Rosalba to ??? |
I've finished setting up the fstab on Chiara and the upgrade to Ubuntu 12 seems to have gone well enough. She's fast:

but I forgot to make sure to order a dual head graphics card for it. So we'll order some dual DVI gaming card that the company recommends. Until then, its only one monitor.
Still, its ready for testing control room tools on. If everything works OK for a couple weeks, we can go to 12 on all the other ones. |
9278
|
Thu Oct 24 12:00:11 2013 |
jamie | Update | CDS | fb acquisition of slow channels |
Quote: |
While that would be good - it doesn't address the EDCU problem at hand. After some verbal emailing, Jamie and I find that the master file in target/fb/ actually doesn't point to any of the EDCU files created by any of the FE machines. It is only using the C0EDCU.ini as well as the *_SLOW.ini files that were last edited in 2011 !!!
So....we have not been adding SLOW channels via the RCG build process for a couple years. Tomorrow morning, Jamie will edit the master file and fix this unless I get to it tonight. There a bunch of old .ini files in the daq/ dir that can be deleted too.
|
I took a look at the situation here so I think I have a better idea of what's going on (it's a mess, as usual):
The framebuilder looks at the "master" file
/opt/rtcds/caltech/c1/target/fb/master
which lists a bunch of other files that contain lists of channels to acquire. It looks like there might have been some notion to just use
/opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini
as the master slow channels file. Slow channels from all over the place have been added to this file, presumably by hand. Maybe the idea was to just add slow channels manually as needed, instead of recording them all by default. The full slow channels lists are in the
/opt/rtcds/caltech/c1/chans/daq/C1EDCU_<model>.ini
files, none of which are listed in the fb master file.
There are also these old slow channel files, like
/opt/rtcds/caltech/c1/chans/daq/SUS_SLOW.ini
There's a perplexing breakdown of channels spread out between these files and C1EDCU.ini:
controls@fb ~ 0$ grep MC3_URS /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini
[C1:SUS-MC3_URSEN_OVERFLOW]
[C1:SUS-MC3_URSEN_OUTPUT]
controls@fb ~ 0$ grep MC3_URS /opt/rtcds/caltech/c1/chans/daq/MCS_SLOW.ini
[C1:SUS-MC3_URSEN_INMON]
[C1:SUS-MC3_URSEN_OUT16]
[C1:SUS-MC3_URSEN_EXCMON]
controls@fb ~ 0$
why some of these channels are in one file and some in the other I have no idea. If the fb finds multiple of the same channel if will fail to start, so at least we've been diligent about keeping disparate lists in the different files.
So I guess the question is if we want to automatically record all slow channels by default, in which case we add in the C1EDCU_<model>.ini files, or if we want to keep just adding them in by hand, in which case we keep the status quo. In either case we should probably get rid of the *_SLOW.ini files (by maybe integrating their channels in C0EDCU.ini), since they're old and just confusing things.
In the mean time, I added C1:FEC-45_CPU_METER to C0EDCU.ini, so that we can keep track of the load there.
|
9282
|
Thu Oct 24 17:26:35 2013 |
jamie | Update | CDS | new dataviewer installed; 'cdsutils avg' now working. |
I installed a new version of dataviewer (2.3.2), and at the same time fixed the NDSSERVER issue we were having with cdsutils. They should both be working now.
The problem turned out to be that I had setup our dataviewer to use the NDSSERVER environment, whereas by default it uses the LIGONDSIP variable. Why we have two different environment variables that mean basically exactly the same thing, who knows. |
9285
|
Thu Oct 24 23:12:21 2013 |
jamie | Update | CDS | new dataviewer installed; no longer works on Ubuntu 10 workstations |
Quote: |
I installed a new version of dataviewer (2.3.2), and at the same time fixed the NDSSERVER issue we were having with cdsutils. They should both be working now.
The problem turned out to be that I had setup our dataviewer to use the NDSSERVER environment, whereas by default it uses the LIGONDSIP variable. Why we have two different environment variables that mean basically exactly the same thing, who knows.
|
Dataviewer seems to run fine on Chiara (Ubuntu 12), but not on Rossa or Pianosa (Ubuntu 10), or Megatron, which I assume is also something medium-old.
We get the error:
controls@megatron:~ 0$ dataviewer
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Error in obtaining chan info.
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
Sadface :( We also get the popup saying "Couldn't connect to fb:8088" |
9287
|
Thu Oct 24 23:30:57 2013 |
jamie | Update | CDS | new dataviewer installed; no longer works on Ubuntu 10 workstations |
Quote: |
Quote: |
I installed a new version of dataviewer (2.3.2), and at the same time fixed the NDSSERVER issue we were having with cdsutils. They should both be working now.
The problem turned out to be that I had setup our dataviewer to use the NDSSERVER environment, whereas by default it uses the LIGONDSIP variable. Why we have two different environment variables that mean basically exactly the same thing, who knows.
|
Dataviewer seems to run fine on Chiara (Ubuntu 12), but not on Rossa or Pianosa (Ubuntu 10), or Megatron, which I assume is also something medium-old.
We get the error:
controls@megatron:~ 0$ dataviewer
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Warning: Not all children have same parent in XtManageChildren
Error in obtaining chan info.
Can't find hostname `fb:8088'
Can't find hostname `fb:8088'; gethostbyname(); error=1
Sadface :( We also get the popup saying "Couldn't connect to fb:8088"
|
Sorry, that was a goof on my part. It should be working now. |
9288
|
Fri Oct 25 01:46:33 2013 |
rana | Update | CDS | fb acquisition of slow channels |
Rather than limp along with a broken SLOW channel system, I fixed it so that the EDCU files made during the RCG build actually get used and added to the channel list (and thereby available in DV and trends).
I first started by adding all of the EDCU files. This completely fails; daqd just doesn't start and gives some weird exceptions.
So I removed a bunch of them and it runs OK now with ~15000 channels. Previously we had ~1500 slow channels.
An in-between config tonight had ~58000 channels and was also running fine, but the connection to the FB would time out when using DV after several minutes. Possibly we can fix this by adding some more RAM to the FB (the DAQD process uses up 45% of the CPU and 39% of the 8 GB of RAM).
Another issue in getting this to work was that there were a bunch of channel name conflicts between the old C0EDCU.ini and the sub-system EDCU files that I was trying to add. I went through by hand and deleted all of the duplicates from the old file. The new frame files are 80 MB, the old ones were 66 MB.
I hope that /frames doesn't become full - not sure how that is wiped... |
9297
|
Sat Oct 26 22:48:55 2013 |
rana | Update | CDS | New/old CDS laptop for X-End |
I made the Yoichi laptop into a CDS laptop called 'asia' a few months ago. Somehow I mistakenly gave it the IP address of our little Acer laptop which is called 'farfalla'. This makes farfalla's network not work. I put the old Dell Aldabella by the PMC where farfalla was and am now upgrading farfalla from CentOS to Ubuntu 10.04 LTS 32-bit. I have updated the hostable on linux1 to give farfalla the 230 IP address and let 'asia' keep 225. |
9302
|
Mon Oct 28 12:53:23 2013 |
Jenne | Update | CDS | Farfalla and Asia added to Host Table in Wiki |
Quote: |
I have updated the hostable on linux1 to give farfalla the 230 IP address and let 'asia' keep 225.
|
Neither of these computers were listed in the Martian Host Table in the wiki, so I put them on there. It's handy to keep this updated, so that we know what IP addresses are available. |
9308
|
Tue Oct 29 16:51:31 2013 |
Jenne | Update | CDS | LSC test points were used up |
Masayuki was concerned that some LSC channels were giving him all zeros. After seeing the error in the terminal window running dataviewer (it said something like 'daqd overloaded'), I checked the lsc model, and sure enough, all the test points were used.
So, I found an entry by Jamie (elog 8431) where he reminds us how to clear the test points. I followed the instructions, and now we're seeing real data (not digital zeros) again. |
9354
|
Wed Nov 6 15:12:01 2013 |
Jenne | Update | CDS | FB not talking to LSC? |
Something funny is going on with the framebuilder's communication with the LSC machine.
This is a different failure mode / error than I have seen before. It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem.
ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem. Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually. All of the models on the lsc machine are running fine now, though.
Here's a screenshot of the CDS overview screen, with the error lights:

|
9357
|
Wed Nov 6 17:21:58 2013 |
Jamit | Update | CDS | FB not talking to LSC? |
Quote: |
Something funny is going on with the framebuilder's communication with the LSC machine.
This is a different failure mode / error than I have seen before. It's not the type of problem that is solved by restarting the mxstreams (that is indicated by also the 2 blocks on top of one another, that are green on the lsc machine right now, being red), although I did try that, before I looked closer and realized that that wasn't the problem.
ssh-ing to c1lsc, and doing a "rtcds restart all" seems to be fixing the problem. Both c1oaf and c1cal needed another round of restarting, because they needed their BURT buttons pressed manually. All of the models on the lsc machine are running fine now, though.
Here's a screenshot of the CDS overview screen, with the error lights:

|
This definitely looks like a timing problem on the c1lsc front end computer. The red lights on the left mean that the timing synchronization is lost at the user model. I'm perplexed why it looks like the IOP is not seeing the same error, though, since it should originate at the ADC. The red lights to the right just mean the timing synchronization is lost with the DAQ, which is too be expected given a timing loss at the front end.
We'll have to take a closer look when this happens again. |
9364
|
Mon Nov 11 12:19:36 2013 |
rana | Update | CDS | FE Web view was fixed |
Quote: |
FE Web view was broken for a long time. It was fixed now.
The problem was that path names were not fixed when we moved the models from the old local place to the SVN structure.
The auto updating script (/cvs/cds/rtcds/caltech/c1/scripts/AutoUpdate/update_webview.cron ) is running on Mafalda.
Link to the web view: https://nodus.ligo.caltech.edu:30889/FE/
|
Seems partially broken again. Not updating for most of the FE. I've commented out the cron lines for this as well as the mostly broken MEDM Snapshots job. I'm in the process of adding them to the megatron cron (since that machine is at least running 64 bit Ubuntu 12, instead of 32-bit CentOS) |
9366
|
Tue Nov 12 15:04:35 2013 |
rana | Update | CDS | FE Web view was fixed |
Quote: |
Seems partially broken again. Not updating for most of the FE. I've commented out the cron lines for this as well as the mostly broken MEDM Snapshots job. I'm in the process of adding them to the megatron cron (since that machine is at least running 64 bit Ubuntu 12, instead of 32-bit CentOS)
|
https://nodus.ligo.caltech.edu:30889/medm/screenshot.html
Seems to now be working. I made several fixes to the scripts to get it working again:
- changed TCSH scripts to BASH. Used /usr/bin/env to find bash.
- fixed stdout and stderr redirection so that we could see all error messages.
- made the PERL scripts executable. most of the PERL errors are not being logged yet.
- fixed paths for the MEDM screens to point to the right directories.
- the screen cap only works on screens which pop open on the left monitor, so I edited the screens so that they open up there by default.
- moved the CRON jobs from mafalda over to megatron. Mafalda no longer is running any crons.
- op540m used to run the 3 projector StripTool displays and have its screen dumped for this web page. Now zita is doing it, but I don't know how to make zita dump her screen.
|
9375
|
Wed Nov 13 18:02:08 2013 |
Jenne | Update | CDS | Can't talk to AUXEY? |
The restore scripts from the IFO config screen half-failed, with this error:
retrying (1/5)...
retrying (2/5)...
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "c1auxey.martian:5064"
Source File: ../cac.cpp line 1214
Current Time: Wed Nov 13 2013 17:24:00.389261330
..................................................................
Jamie, do you know what this might be? When requested, ETMY was not misaligned or restored, but we got these errors. So, somehow we're not talking properly to EY, but other things seem fine (the models are running okay, the suspension is damped, etc, etc.) |
9387
|
Thu Nov 14 22:23:22 2013 |
Jenne | Update | CDS | Can't talk to AUXEY? |
Quote: |
The restore scripts from the IFO config screen half-failed, with this error:
retrying (1/5)...
retrying (2/5)...
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "c1auxey.martian:5064"
Source File: ../cac.cpp line 1214
Current Time: Wed Nov 13 2013 17:24:00.389261330
..................................................................
Jamie, do you know what this might be? When requested, ETMY was not misaligned or restored, but we got these errors. So, somehow we're not talking properly to EY, but other things seem fine (the models are running okay, the suspension is damped, etc, etc.)
|
This problem is now worse - the sliders on IFO_ALIGN for ETMY are white. I can't telnet to the machine either, although auxex works okay. Rather, it looks like maybe I'm getting to auxey, but then I'm immediately booted. I can ping both c1auxex and c1auxey with no problem.
Heeeeelllp please. Is this just a "shut off, then turn back on" problem? I'm wary of hard rebooting things, with all the warnings and threats in the elog lately. I've sent an email to Jamie to ping him.
There are some vague instructions in the wiki, but they begin at doing the burt restores, not actually restarting the computers: wiki Back in July, elog 8858 was written, from which the wiki instructions seem to be based. But in the elog it says "...went to the /cvs/cds/caltech/target/ area and started to (one by one) inspect all of the targets to see if they were alive.", but I don't know what "inspected" means in this case. I probably should, since I've been here for something like a millennia, but I don't.
controls@rossa:~ 0$ telnet c1auxey
Trying 192.168.113.60...
Connected to c1auxey.martian.
Escape character is '^]'.
Connection closed by foreign host.
controls@rossa:~ 1$ telnet c1auxex
Trying 192.168.113.59...
Connected to c1auxex.martian.
Escape character is '^]'.
c1auxex >
telnet> ^]
?Invalid command
telnet> exit
?Invalid command
telnet> quit
Connection closed.
controls@rossa:~ 0$ telnet c1auxey
Trying 192.168.113.60...
Connected to c1auxey.martian.
Escape character is '^]'.
Connection closed by foreign host.
|
9391
|
Fri Nov 15 10:19:26 2013 |
manasa | Update | CDS | Can't talk to AUXEY? |
Quote: |
This problem is now worse - the sliders on IFO_ALIGN for ETMY are white. I can't telnet to the machine either, although auxex works okay. Rather, it looks like maybe I'm getting to auxey, but then I'm immediately booted. I can ping both c1auxex and c1auxey with no problem.
Heeeeelllp please. Is this just a "shut off, then turn back on" problem? I'm wary of hard rebooting things, with all the warnings and threats in the elog lately. I've sent an email to Jamie to ping him.
There are some vague instructions in the wiki, but they begin at doing the burt restores, not actually restarting the computers: wiki Back in July, elog 8858 was written, from which the wiki instructions seem to be based. But in the elog it says "...went to the /cvs/cds/caltech/target/ area and started to (one by one) inspect all of the targets to see if they were alive.", but I don't know what "inspected" means in this case. I probably should, since I've been here for something like a millennia, but I don't.
|
This is what was done (as I recollect) when we said "inspected":Tenet into the computer, ping them and look at the status. Since c1auxey is not responding, here is how c1auxex responds.
controls@rossa:/cvs/cds/caltech/target 0$ telnet c1auxex
Trying 192.168.113.59...
Connected to c1auxex.martian.
Escape character is '^]'.
c1auxex > h
1 i
2 -help
3 --help
4 h
5 2
6 h
7 -help
8 i
9 h
value = 0 = 0x0
c1auxex > i
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tExcTask _excTask fde244 0 PEND 87094 fde1ac 3006b 0
tLogTask _logTask fdb944 0 PEND 87094 fdb8a8 0 0
tShell _shell ddad00 1 READY 6d974 dda9c8 3d0001 0
tRlogind _rlogind fbc11c 2 PEND 2b604 fbbdf4 0 0
tTelnetd _telnetd fba278 2 PEND 2b604 fba1a8 0 0
tTelnetOutT_telnetOutTa db7578 2 READY 2b604 db72e0 0 0
tTelnetInTa_telnetInTas db6060 2 READY 2b5dc db5d68 0 0
callback _callbackTas f7941c 40 PEND 2b604 f793d4 0 0
scanEvent ee7ca8 ecacb4 41 PEND 2b604 ecac6c 0 0
tNetTask _netTask fd75b8 50 READY 6be6c fd7550 0 0
scanPeriod ee78f8 ecd554 53 READY 6d192 ecd508 0 0
scanPeriod ee78f8 f23e48 54 DELAY 6d192 f23dfc 0 6
tFtpdTask _ftpdTask fb7848 55 PEND 2b604 fb778c 0 0
scanPeriod ee78f8 f266e8 55 READY 6d192 f2669c 0 0
scanPeriod ee78f8 f38678 56 READY 6d192 f3862c 0 0
callback _callbackTas f7bcbc 57 PEND 2b604 f7bc74 0 0
scanPeriod ee78f8 f906d8 57 DELAY 6d192 f9068c 0 59
scanPeriod ee78f8 f995ac 58 DELAY 6d192 f99560 0 238
scanPeriod ee78f8 f9c908 59 DELAY 6d192 f9c8bc 0 538
callback _callbackTas fa4c1c 65 PEND 2b604 fa4bd4 0 0
scanOnce ee7764 f9f96c 65 PEND 2b604 f9f92c 0 0
epicsPrint f0501c e88fa0 70 PEND 2b604 e88f64 c0002 0
ts_Casync ee5bae f76b7c 70 DELAY 6d192 f76880 3d0004 178
tPortmapd _portmapd fb8d60 100 PEND 2b604 fb8c2c 16 0
EgRam ea00e4 fa14ac 100 PEND 2b604 fa1458 0 0
CA client _camsgtask d85878 180 PEND 2b604 d85774 3d0004 0
CA client _camsgtask df91e8 180 PEND 2b604 df90e4 0 0
CA client _camsgtask d98bf4 180 PEND 2b604 d98af0 0 0
CA client _camsgtask e03cd0 180 PEND 2b604 e03bcc 0 0
CA client _camsgtask ddf2b8 180 PEND 2b604 ddf1b4 0 0
CA client _camsgtask faaec8 180 PEND 2b604 faadc4 0 0
CA client _camsgtask d79f3c 180 PEND 2b604 d79e38 0 0
CA TCP _req_server f305dc 181 PEND 2b604 f30540 0 0
CA repeaterf109e2 f215a8 181 PEND 2b604 f21474 0 0
CA event _event_task d7fe58 181 PEND 2b604 d7fe10 0 0
CA event _event_task d6ce5c 181 PEND 2b604 d6ce14 0 0
CA event _event_task dab7e0 181 PEND 2b604 dab798 0 0
CA event _event_task d76efc 181 PEND 2b604 d76eb4 0 0
CA event _event_task d9bddc 181 PEND 2b604 d9bd94 0 0
CA event _event_task d9a864 181 PEND 2b604 d9a81c 0 0
CA event _event_task da8d8c 181 PEND 2b604 da8d44 0 0
CA UDP _cast_server f2f064 182 READY efcabe f2efe4 0 0
CA online _rsrv_online f2d84c 183 DELAY 6d192 f2d7bc 0 265
EV save_res_event_task de88dc 189 PEND 2b604 de8894 3006b 0
save_restor_save_restor df61cc 190 PEND 2b604 df5c44 3d0002 0
RD save_res_cac_recv_ta fb47d8 191 READY 2b604 fb46a4 3d0004 0
logRestart f05d42 e861c0 200 PEND+T 2b604 e86174 33 1714
taskwd ef4d46 e85030 200 DELAY 6d192 e84f7c 0 224
value = 0 = 0x0
c1auxex >
telnet> quit
Connection closed.
controls@rossa:/cvs/cds/caltech/target 0$ |
9393
|
Fri Nov 15 10:49:55 2013 |
jamie | Update | CDS | Can't talk to AUXEY? |
Please just try rebooting the vxworks machine. I think there is a key on the card or create that will reset the device. These machines are "embeded" so they're designed to be hard reset, so don't worry, just restart the damn thing and see if that fixes the problem. |
9394
|
Fri Nov 15 12:00:28 2013 |
Koji | Update | CDS | Can't talk to AUXEY? |
Quote: |
Please just try rebooting the vxworks machine. I think there is a key on the card or create that will reset the device. These machines are "embeded" so they're designed to be hard reset, so don't worry, just restart the damn thing and see if that fixes the problem.
|
Don't forget to run burtrestore for the target. |
9395
|
Fri Nov 15 12:38:50 2013 |
Jenne | Update | CDS | Can't talk to AUXEY? |
Quote: |
Please just try rebooting the vxworks machine. I think there is a key on the card or create that will reset the device. These machines are "embeded" so they're designed to be hard reset, so don't worry, just restart the damn thing and see if that fixes the problem.
|
This is what I remember doing all the time when Rob was around, but with all the new computers, I forgot whether or not this was allowed for the slow computers.
Anyhow, I went down there and keyed the crate, but auxey isn't coming back. I'll give it a few more minutes and check again, but then I might go and power cycle it again. If that doesn't work, we may have a much bigger problem. |
9396
|
Fri Nov 15 13:26:00 2013 |
Jenne | Update | CDS | AUXEY is back |
Quote: |
Quote: |
Please just try rebooting the vxworks machine. I think there is a key on the card or create that will reset the device. These machines are "embeded" so they're designed to be hard reset, so don't worry, just restart the damn thing and see if that fixes the problem.
|
This is what I remember doing all the time when Rob was around, but with all the new computers, I forgot whether or not this was allowed for the slow computers.
Anyhow, I went down there and keyed the crate, but auxey isn't coming back. I'll give it a few more minutes and check again, but then I might go and power cycle it again. If that doesn't work, we may have a much bigger problem.
|
I went and keyed the crate again, and this time the computer came back. I burt restored to Nov 10th. ETMY is damping again. |
9402
|
Mon Nov 18 21:20:54 2013 |
Jenne | Update | CDS | Can't talk to AUXEY? |
Quote: |
The restore scripts from the IFO config screen half-failed, with this error:
retrying (1/5)...
retrying (2/5)...
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "c1auxey.martian:5064"
Source File: ../cac.cpp line 1214
Current Time: Wed Nov 13 2013 17:24:00.389261330
..................................................................
Jamie, do you know what this might be? When requested, ETMY was not misaligned or restored, but we got these errors. So, somehow we're not talking properly to EY, but other things seem fine (the models are running okay, the suspension is damped, etc, etc.)
|
The auxey machine is back, in that I can interact with the IFO_ALIGN sliders, and they actually make the optic move, but I still can't read and write to and from the EPICs channels:
controls@rossa:/opt/rtcds/caltech/c1/medm/MISC/ifoalign/burt 0$ cdsutils read C1:SUS-ETMY_PIT_COMM
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "c1auxey.martian:5064"
Source File: ../cac.cpp line 1214
Current Time: Mon Nov 18 2013 21:13:52.044973819
..................................................................
Could not connect to channel (timeout=2s): C1:SUS-ETMY_PIT_COMM
controls@rossa:/opt/rtcds/caltech/c1/medm/MISC/ifoalign/burt 1$ cdsutils read C1:SUS-ETMY_YAW_COMM
CA.Client.Exception...............................................
Warning: "Virtual circuit disconnect"
Context: "c1auxey.martian:5064"
Source File: ../cac.cpp line 1214
Current Time: Mon Nov 18 2013 21:14:07.040168660
..................................................................
Could not connect to channel (timeout=2s): C1:SUS-ETMY_YAW_COMM
controls@rossa:/opt/rtcds/caltech/c1/medm/MISC/ifoalign/burt 1$
This is also causing trouble for the BURT save and BURT restore scripts, that are called from the IFO_ALIGN screen. If I look at the log that is written from an attempted 'save' of the slider values, I see:
**** READ BURT LOGFILE
--- Start processing files
file >/opt/rtcds/caltech/c1/medm/MISC/ifoalign/burt/ETMY.req<
preprocessing ... done
pv >C1:SUS-ETMY_PIT_COMM< nreq=-1
pv >C1:SUS-ETMY_YAW_COMM< nreq=-1
--- End processing files
--- Start searches
C1:SUS-ETMY_PIT_COMM ... ca_search_and_connect() ... OK
C1:SUS-ETMY_YAW_COMM ... ca_search_and_connect() ... OK
--- End searches
Waiting for 2 outstanding search(es) ...
Waiting for 2 outstanding search(es) ...
did not find 2
--- Start reads
C1:SUS-ETMY_PIT_COMM ... not connected so no ca_array_get_callback()
C1:SUS-ETMY_YAW_COMM ... not connected so no ca_array_get_callback()
--- End reads
--- Start wait for pending reads
-- End wait for pending reads 0 outstanding read(s)
**** END BURT LOGFILE
The burt save file has no values in it. Even if I copy over the ETMX save file and put in the correct channel names and values, a burt restore is unsuccessful.
So, I can do locking tonight by restoring and misaligning by hand, but this sucks, and needs to be fixed. Other optics (at least PRM, SRM, ETMX) seem to be working just fine. It's just ETMY that has a problem.
|
9412
|
Tue Nov 19 15:04:14 2013 |
Jenne | Update | CDS | Can talk to AUXEY again |
The ETMY sliders on IFO_ALIGN were white again this morning, so I went down to the Yend and pushed the RESET button on auxey. I then did a burt restore to 00:07am this morning for both auxey and auxex (since the stickers on the machines are still the old naming convention, I wonder if the autoburt is also backwards, so I did both). Now the 'save' and 'restore' scripts for ETMY are working again.
Hopefully it's all better now, but I'll keep an eye on it. |
9422
|
Fri Nov 22 09:54:22 2013 |
Steve | Update | CDS | DAQ? |
Jamie, I think the computers know that you are away. c1lsc keeps going down.
The short time plots are correct. |
Attachment 1: comp8d.png
|
|
9425
|
Mon Nov 25 10:57:14 2013 |
Koji | Update | CDS | woes on the X-end hosts |
This morning I came in the 40m then found
1) c1auxex was throwing out the same errors as recently seen.
2) c1iscex processes had errors which persisted even after the mx stream reset.
1) c1auxex - fixed
Tried telnet c1auxex => rejected by the host
Went down to the south end. Power cycled the target. Came back to the control room.
=> Confirmed the epics read/write is back.
Burtrestored the epics vars for the target to the snapshot on 31th Oct at 5:07.
2) c1iscex - still not fixed
ssh c1iscex
rtcds restart all => c1x01 is still in red.
Followed the procedure on the elog entry 9007. => Still the same error.
At least c1x01 is stalled. Here is the status.
Sync Source is TDS.
C1:DAQ-DC0_C1X01_STATUS is 0x2bad.
C1:DAQ-DC0_C1X01_CRC_SUM stays 0.
The screen shot is attached.
dmesg related to c1x01
controls@c1iscex ~ 0$ dmesg |grep c1x01
[ 32.152010] c1x01: startup time is 1069440223
[ 32.152012] c1x01: cpu clock 3000325
[ 32.152014] c1x01: Epics shmem set at 0xffffc9001489c000
[ 32.152208] c1x01: IPC at 0xffffc90018947000
[ 32.152209] c1x01: Allocated daq shmem; set at 0xffffc9000480c000
[ 32.152210] c1x01: configured to use 4 cards
[ 32.152211] c1x01: Initializing PCI Modules
[ 32.152226] c1x01: ADC card on bus b; device 4 prim b
[ 32.152227] c1x01: adc card on bus b; device 4 prim b
[ 32.154801] c1x01: pci0 = 0xdc300400
[ 32.154837] c1x01: pci2 = 0xdc300000
[ 32.154842] c1x01: ADC I/O address=0xdc300000 0xffffc90003f62000
[ 32.154845] c1x01: BCR = 0x84060
[ 32.154858] c1x01: RAG = 0x117d8
[ 32.154861] c1x01: BCR = 0x84260
[ 32.583220] c1x01: SSC = 0x16
[ 32.583223] c1x01: IDBC = 0x1f
[ 32.583236] c1x01: DAC card on bus 14; device 4 prim 14
[ 32.583237] c1x01: dac card on bus 14; device 4
[ 32.584527] c1x01: pci0 = 0xdc400400
[ 32.584546] c1x01: dac pci2 = 0xdc400000
[ 32.584551] c1x01: DAC I/O address=0xdc400000 0xffffc90003f6a000
[ 32.584555] c1x01: DAC BCR = 0x810
[ 32.584678] c1x01: DAC BCR after init = 0x30080
[ 32.584681] c1x01: DAC CSR = 0xffff
[ 32.584687] c1x01: DAC BOR = 0x3415
[ 32.584693] c1x01: set_8111_prefetch: subsys=0x8114; vendor=0x10e3
[ 32.584722] c1x01: Contec 1616 DIO card on bus 23; device 0
[ 32.593429] c1x01: contec 1616 dio pci2 = 0x4001
[ 32.593430] c1x01: contec 1616 diospace = 0x4000
[ 32.593434] c1x01: contec dio pci2 card number= 0x0
[ 32.593439] c1x01: Contec BO card on bus 18; device 0
[ 32.593447] c1x01: contec dio pci2 = 0x3001
[ 32.593448] c1x01: contec32L diospace = 0x3000
[ 32.593451] c1x01: contec dio pci2 card number= 0x0
[ 32.593456] c1x01: 5565 RFM card on bus 7; device 4
[ 32.597218] Modules linked in: c1x01(+) open_mx mbuf
[ 32.599939] [<ffffffffa002e430>] mapRfm+0x71/0x392 [c1x01]
[ 32.600199] [<ffffffffa002ec91>] mapPciModules+0x540/0x8cf [c1x01]
[ 32.600458] [<ffffffffa002f2c1>] init_module+0x2a1/0x9d6 [c1x01]
[ 32.600717] [<ffffffffa002f020>] ? init_module+0x0/0x9d6 [c1x01]
[ 32.616194] c1x01: RFM address is 0xd8000000
[ 32.616196] c1x01: CSR address is 0xdc000000
[ 32.616206] c1x01: Board id = 0x65
[ 32.616209] c1x01: DMA address is 0xdc000400
[ 32.616213] c1x01: 5565DMA at 0xffffc90003f72400
[ 32.616215] c1x01: 5565 INTCR = 0xf010100
[ 32.616217] c1x01: 5565 INTCR = 0xf000000
[ 32.616218] c1x01: 5565 MODE = 0x43
[ 32.616220] c1x01: 5565 DESC = 0x0
[ 32.616232] c1x01: 5 PCI cards found
[ 32.616233] c1x01: ***************************************************************************
[ 32.616234] c1x01: 1 ADC cards found
[ 32.616235] c1x01: ADC 0 is a GSC_16AI64SSA module
[ 32.616236] c1x01: Channels = 64
[ 32.616236] c1x01: Firmware Rev = 34
[ 32.616238] c1x01: ***************************************************************************
[ 32.616239] c1x01: 1 DAC cards found
[ 32.616239] c1x01: DAC 0 is a GSC_16AO16 module
[ 32.616240] c1x01: Channels = 16
[ 32.616241] c1x01: Filters = None
[ 32.616242] c1x01: Output Type = Differential
[ 32.616242] c1x01: Firmware Rev = 6
[ 32.616244] c1x01: MASTER DAC SLOT 0 1
[ 32.616244] c1x01: ***************************************************************************
[ 32.616246] c1x01: 0 DIO cards found
[ 32.616246] c1x01: ***************************************************************************
[ 32.616248] c1x01: 0 IIRO-8 Isolated DIO cards found
[ 32.616248] c1x01: ***************************************************************************
[ 32.616250] c1x01: 0 IIRO-16 Isolated DIO cards found
[ 32.616250] c1x01: ***************************************************************************
[ 32.616252] c1x01: 1 Contec 32ch PCIe DO cards found
[ 32.616252] c1x01: 1 Contec PCIe DIO1616 cards found
[ 32.616253] c1x01: 0 Contec PCIe DIO6464 cards found
[ 32.616254] c1x01: 2 DO cards found
[ 32.616255] c1x01: TDS controller 0 is at 0
[ 32.616256] c1x01: Total of 4 I/O modules found and mapped
[ 32.616257] c1x01: ***************************************************************************
[ 32.616259] c1x01: 1 RFM cards found
[ 32.616260] c1x01: RFM 0 is a VMIC_5565 module with Node ID 41
[ 32.616261] c1x01: address is 0x18d80000
[ 32.616261] c1x01: ***************************************************************************
[ 32.616262] c1x01: Initializing space for daqLib buffers
[ 32.616263] c1x01: Initializing Network
[ 32.616264] c1x01: Found 1 frameBuilders on network
[ 32.616265] c1x01: Epics burt restore is 0
[ 33.616012] c1x01: Epics burt restore is 0
[ 34.617018] c1x01: Epics burt restore is 0
[ 35.618017] c1x01: Epics burt restore is 0
[ 36.619011] c1x01: Epics burt restore is 0
[ 37.621007] c1x01: Epics burt restore is 0
[ 38.622008] c1x01: Epics burt restore is 0
[ 39.733257] c1x01: Sync source = 4
[ 39.733257] c1x01: Waiting for EPICS BURT Restore = 1
[ 39.793001] c1x01: Waiting for EPICS BURT 0
[ 39.793001] c1x01: BURT Restore Complete
[ 39.793001] c1x01: Found a BQF filter 0
[ 39.793001] c1x01: Found a BQF filter 1
[ 39.793001] c1x01: Initialized servo control parameters.
[ 39.794002] c1x01: DAQ Ex Min/Max = 1 3
[ 39.794002] c1x01: DAQ XEx Min/Max = 3 53
[ 39.794002] c1x01: DAQ Tp Min/Max = 10001 10007
[ 39.794002] c1x01: DAQ XTp Min/Max = 10007 10507
[ 39.794002] c1x01: DIRECT MEMORY MODE of size 64
[ 39.794002] c1x01: daqLib DCU_ID = 19
[ 39.794002] c1x01: Calling feCode() to initialize
[ 39.794002] c1x01: entering the loop
[ 39.794002] c1x01: ADC setup complete
[ 39.794002] c1x01: DAC setup complete
[ 39.794002] c1x01: writing BIO 0
[ 39.814002] c1x01: writing DAC 0
[ 39.814002] c1x01: Triggered the ADC
[ 40.874003] c1x01: timeout 0 1000000
[ 40.874003] c1x01: exiting from fe_code()
|
Attachment 1: Screenshot.png
|
|
9426
|
Mon Nov 25 12:57:54 2013 |
Jamie | Update | CDS | timing problem at c1iscex IO chassis |
There is definitely a timing distribution malfunction at the c1iscex IO chassis. There is no timing link between the "Master Timer Sequencer D050239" at the 1X6 and the c1iscex IO chassis. Link lights at both ends are dead. No timing, no running models.
It does not appear to be a problem with the Master Timer Sequencer. I moved the c1iscey link to the J15 port on the sequencer and it worked fine. This means its either a problem with the fiber or the timing card in the IO chassis. The IO timing card is powered and does have what appear to be normal status lights on (except for the fiber link lights). It's getting what I think is the nominal 4V power. The connection to the IO chassis backplane board look ok. So maybe it's just a dead fiber issue?
I do not know what could have been the problem with c1auxex, or if it's related to the fast timing issue. |
9427
|
Mon Nov 25 17:28:33 2013 |
Jenne | Update | CDS | timing problem at c1iscex IO chassis |
Quote: |
There is definitely a timing distribution malfunction at the c1iscex IO chassis. There is no timing link between the "Master Timer Sequencer D050239" at the 1X6 and the c1iscex IO chassis. Link lights at both ends are dead. No timing, no running models.
It does not appear to be a problem with the Master Timer Sequencer. I moved the c1iscey link to the J15 port on the sequencer and it worked fine. This means its either a problem with the fiber or the timing card in the IO chassis. The IO timing card is powered and does have what appear to be normal status lights on (except for the fiber link lights). It's getting what I think is the nominal 4V power. The connection to the IO chassis backplane board look ok. So maybe it's just a dead fiber issue?
I do not know what could have been the problem with c1auxex, or if it's related to the fast timing issue.
|
Steve and Koji looked around, and called around, and there seem to be no spare fibers that are long enough to reach the end, so Steve has ordered
"Tripp Lite N520-30M 100' Multimode Duplex 50/125 Fiber Optic Patch Cable LC/LC"
and it should be here tomorrow. |
9428
|
Wed Nov 27 14:45:49 2013 |
Jenne | Update | CDS | timing problem at c1iscex IO chassis |
[Koji, Jenne]
The new fiber arrived today, and we tried it out. No luck. We think it is the timing card, so we'll need to get one, since we can't find a spare.
Order of operations:
* Lay new fiber on floor, plugged it in at both ends, saw no fiber link lights.
* From control room, killed all models running on c1iscex, shutdown computer. Still no link lights.
* Power cycled computer and IO chassis.
* Tried plugging new fiber into different port on Master Timing Sequencer, with other end still plugged in to c1iscex. Still no link lights.
* Looked around with flashlight at Xend IO chassis. The board that the fiber is connected to does not have a power light, although the board next to it has 2. We compared with the SUS IO chassis, and the board there with the fiber has one power light, plus the fiber link lights, as well as 2 on the board next to the fiber. So, perhaps there's a problem with power distribution on the timing board at the Xend?
* Unplugged and replugged the power connector to the timing board, inside the IO chassis, board next to the fiber's board got lights back, but the fiber's board did not. However, power must be going through the board with the fiber attached, to the next board, so there's power at least on some part of the timing board, just not the whole thing.
From this, we conclude that the blue fiber that was in place is probably fine (or is not found guilty), and that we need a replacement timing board. Koji didn't find one in the "CDS stuff" boxes underneath the Jenne Laser, and I feel like I recall Jamie saying that we would have to get a spare from somewhere else. We rolled up the new spare fiber, and put it in the box with other "CDS Stuff" under the Jenne Laser table. |
9429
|
Wed Nov 27 16:29:21 2013 |
Jenne | Update | CDS | Accidentally turned off SUS IO chassis |
[Jenne, Koji]
I was trying to lock the Yarm, and saw that I was not getting signals to go between the LSC and SCY models. I had digital zeros for TRY, and when I overrode the trigger and tried to force signal to ETMY, I had digital zeros at the SUS-ETMY_LSC input. The corresponding filter bank in the rfm model was receiving signals, so the Dolphin connection between LSC and SUS was okay, it was just the RFM connection going to the end station that wasn't succeeding.
Koji restarted the c1scy model, and then went inside the IFO room, and found that the SUS IO chassis power was off. We must have accidentally turned it off while we were in there earlier. Koji turned on the power, and also restarted the rfm model, and we now have real signals going back and forth.
Yarm is locked, ASS worked nicely, etc, etc, so things seem normal again (with the Yarm....ETMX stuff is still out of order). |
9432
|
Mon Dec 2 14:24:10 2013 |
Steve | Update | CDS | computer problems |
Rack 1x6 is very noisy.
SunFire X4600 computer: FB (directly below Megatron) has it's yellow warning light on. It must be loosing one of it's fan bearings.
Jetstore's error message: IDE channel #2 reading error |
Attachment 1: c1iscex.png
|
|
Attachment 2: 1X6.JPG
|
|
9433
|
Mon Dec 2 16:04:47 2013 |
Jamie | Update | CDS | c1iscex timing problem mysteriously disappears??? (thanksgiving miracle???) |
Quote: |
There is definitely a timing distribution malfunction at the c1iscex IO chassis. There is no timing link between the "Master Timer Sequencer D050239" at the 1X6 and the c1iscex IO chassis. Link lights at both ends are dead. No timing, no running models.
It does not appear to be a problem with the Master Timer Sequencer. I moved the c1iscey link to the J15 port on the sequencer and it worked fine. This means its either a problem with the fiber or the timing card in the IO chassis. The IO timing card is powered and does have what appear to be normal status lights on (except for the fiber link lights). It's getting what I think is the nominal 4V power. The connection to the IO chassis backplane board look ok. So maybe it's just a dead fiber issue?
I do not know what could have been the problem with c1auxex, or if it's related to the fast timing issue.
|
I just got over here from Downs, where I managed to convince Todd to let me borrow one of their three remaining timing slave boards for c1iscex. I walked down to the X end to replace the board only to discover that the link light on the existing timing board was back! c1iscex was not responding, so I hard rebooted the machine, and everything came up rosy (all green!):

To repeat, I DID NOTHING. The thing was working when I got here. I have no idea when it came back, or how, but it's at least working for the moment. I re-enabled the watchdog for ETMX SUS and it's now damped normally.
I'm going to hold on to the timing card for a couple of days, in case the failure comes back, but we'll need to return it to Downs soon, and probably think about getting some spare backups from Columbia. |
9434
|
Mon Dec 2 17:05:13 2013 |
Jenne | Update | CDS | c1iscex timing problem mysteriously disappears??? (thanksgiving miracle???) |
Steve was trying to do something to it this morning, but I'm not exactly clear on what it was. Maybe that helped? Steve, can you tell us what you were trying to do this morning? |
9435
|
Tue Dec 3 07:42:23 2013 |
Steve | Update | CDS | c1iscex timing problem mysteriously disappears??? (thanksgiving miracle???) |
Quote: |
Steve was trying to do something to it this morning, but I'm not exactly clear on what it was. Maybe that helped? Steve, can you tell us what you were trying to do this morning?
|
I was trying to repeat elog 9007 I did only get to line 2 of the Solution by Koji when Ottavia shut down, where I was working. This was all what I did. |
9436
|
Tue Dec 3 17:08:06 2013 |
Koji | Update | CDS | computer problems |
It seems that the front fan unit was running at the full speed. The fan itself seems still OK.
I talked with Jamie and make a power cycling (i.e. shutdown gracefully, unplug the power supply cables (x4), plug them in again, and pushed the power button)
The warning signal went off and the fan is quiet. FOR NOW.
Now, daqd and ndsd is down.
FB cannot mount /opt/rtcds and /cvs/cds during its boot.
After mounting these manually, I tried to run /opt/rtcds/caltech/c1/target/fb/start_daqd.inittab and /opt/rtcds/caltech/c1/target/fb/start_nds.inittab
but they don't keep running.
I'll be back to this issue tomorrow with Jamie's help. |