40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  ATF eLog  Not logged in ELOG logo
Entry  Fri Dec 10 10:46:11 2010, Zach, Computing, DAQ, busted again! 
    Reply  Fri Dec 10 11:21:14 2010, Alastair, Computing, DAQ, busted again! 
       Reply  Fri Dec 10 11:44:05 2010, Alastair, Computing, DAQ, busted again! 
Message ID: 1207     Entry time: Fri Dec 10 11:44:05 2010     In reply to: 1206
Author: Alastair 
Type: Computing 
Category: DAQ 
Subject: busted again! 

Quote:

Quote:

 The DAQ is doing the same thing it was doing a little over a week ago. Namely, it seems to reset itself or otherwise get flustered every minute or so, rendering any useful data requests futile. Let me remind everyone what happened last time:

  1. Found the problem, tried to fix it using DAQ Reload, etc.
  2. Thought that the DAQ Rate might have been the issue, so I opened daqconfig and removed some channels
  3. Restarted daqd
  4. daqd didn't seem to stay alive---problem
  5. Got Frank
  6. Frank observed the issue and saw that I wasn't just drooling on fb0
  7. Rebooted fb0
  8. fb0 complained about something on boot and forced a disk check
  9. A day passed
  10. Followed some instructions for removing the problem (mainly "yes, yes, yes, yes")
  11. I mistakenly allowed it to reboot again without changing some manual settings that I needed to have
  12. fb0 complained about something on boot and forced a disk check
  13. A day passed
  14. Frank did some magic in the recovery menu and then reboots fb0
  15. Everything worked for a week and a half or so
  16. Step 1

I'm not sure if Frank got an intuitive grasp on what was wrong last time, but it appears to be the same issue. I imagine that going through this list (perhaps with some omissions) will fix the problem again, but that will take a day or so and it may only last a few days after. Suggestion?

 The realtime stuff all seems to be working fine, it's just the frames that aren't getting written.  Running dataviewer on FB1 I can get realtime data for our channels.  If I try to get data from the past you get:

Connecting to NDS Server fb0 (TCP port 8088)
Connecting.... done
T0=10-12-10-19-19-05; Length=60 (s)
No data found

Also, the installation of dataviewer on WS2 doesn't seem to work - does anyone know about this?  Last thing I knew was that it wasn't installed at all on WS2.  It opens fine, but when you try to get data it comes up with "incomplete installation" in the terminal window.

 It does seem to have the same symptoms as before.  When I looked at FB0 it was running daqd with a startup time of 11:25.  I looked again just now and the startup time is listed as 11:35, so it seems it is just re-starting all the time.  The weird thing is that it does appear to be running the whole time.  I've not found one instance where I've checked it and found it to not be running.  The daqd log is not so enlightening for me, but it does list a whole pile of errors - I just don't know what it's normal operation log would look like.  Some of them look quite serious to me, like " test point manager on FB may be down".  Log file reproduced below:

startup file interpreter thread tid=1084229952
calling yyparse(3, 4)
[Fri Dec 10 11:40:02 2010] ->3: set allow_tpman_connect_fail
[Fri Dec 10 11:40:02 2010] ->3: set avoid_reconnect
[Fri Dec 10 11:40:02 2010] ->3: #set cit_40m=1
[Fri Dec 10 11:40:02 2010] ->3: set dcu_rate 10=32768
[Fri Dec 10 11:40:02 2010] ->3: set dcu_rate 11=2048
[Fri Dec 10 11:40:02 2010] ->3: set controller_dcu=10
[Fri Dec 10 11:40:02 2010] ->3: set dcu_status_check=1
[Fri Dec 10 11:40:02 2010] ->3: set debug=0
[Fri Dec 10 11:40:02 2010] ->3: set zero_bad_data=0
[Fri Dec 10 11:40:02 2010] ->3: set master_config="/cvs/cds/caltech/target/fb/master"
[Fri Dec 10 11:40:02 2010] finished configuring data channels
[Fri Dec 10 11:40:02 2010] ->3: configure channels begin end
[Fri Dec 10 11:40:02 2010] ->3: set gds_server = "fb" "fb" 0 0
Fri Dec 10 11:40:04 2010 [16362]:Failed to connect to the test point manager node 1
Fri Dec 10 11:40:04 2010 [16362]:Test point manager on fb may be down
Fri Dec 10 11:40:04 2010 [16362]:tpRequest() returned -4
[Fri Dec 10 11:40:04 2010] ->3: set gps_leaps = 820108813
[Fri Dec 10 11:40:04 2010] ->3: set detector_name="CIT"
[Fri Dec 10 11:40:04 2010] ->3: set detector_prefix="C2"
[Fri Dec 10 11:40:04 2010] ->3: set detector_longitude=-90.7742403889
[Fri Dec 10 11:40:04 2010] ->3: set detector_latitude=30.5628943337
[Fri Dec 10 11:40:04 2010] ->3: set detector_elevation=.0
[Fri Dec 10 11:40:04 2010] ->3: set detector_azimuths=1.1,4.7123889804
[Fri Dec 10 11:40:04 2010] ->3: set detector_altitudes=1.0,2.0
[Fri Dec 10 11:40:04 2010] ->3: set detector_midpoints=2000.0, 2000.0
[Fri Dec 10 11:40:04 2010] ->3: #enable frame_wiper
[Fri Dec 10 11:40:04 2010] ->3: set num_dirs = 200
[Fri Dec 10 11:40:04 2010] ->3: set frames_per_dir=225
[Fri Dec 10 11:40:04 2010] ->3: set full_frames_per_file=1
[Fri Dec 10 11:40:04 2010] ->3: set full_frames_blocks_per_frame=32
[Fri Dec 10 11:40:04 2010] ->3: set frame_dir="/frames/full", "C-R-", ".gwf"
[Fri Dec 10 11:40:04 2010] ->3: #scan frames
[Fri Dec 10 11:40:04 2010] Frame file wiper cannot be enabled when gps_time_dirs==1
Write external script to clean up old frame files

[Fri Dec 10 11:40:04 2010] trend frame wiper enabled
[Fri Dec 10 11:40:04 2010] ->3: enable trend_frame_wiper
[Fri Dec 10 11:40:04 2010] ->3: set trend_num_dirs=60
[Fri Dec 10 11:40:04 2010] ->3: set trend_frames_per_dir=1440
[Fri Dec 10 11:40:04 2010] ->3: set trend_frame_dir= "/frames/trend/second", "C-T-", ".gwf"
[Fri Dec 10 11:40:04 2010] ->3: set raw-minute-trend-dir="/frames/trend/minute_raw"
[Fri Dec 10 11:40:04 2010] ->3: set nds-jobs-dir="/cvs/cds/caltech/target/fb"
[Fri Dec 10 11:40:04 2010] Frame file wiper cannot be enabled when gps_time_dirs==1
Write external script to clean up old frame files

[Fri Dec 10 11:40:04 2010] minute trend frame wiper enabled
[Fri Dec 10 11:40:04 2010] ->3: enable minute-trend-frame-wiper
[Fri Dec 10 11:40:04 2010] ->3: set minute-trend-num-dirs=10
[Fri Dec 10 11:40:04 2010] ->3: set minute-trend-frames-per-dir=24
[Fri Dec 10 11:40:04 2010] ->3: set minute-trend-frame-dir="/frames/trend/minute", "C-M-", ".gwf"
[Fri Dec 10 11:40:04 2010] ->3: #scan minute-trend-frames
[Fri Dec 10 11:40:04 2010] ->3: #scan trend-frames
[Fri Dec 10 11:40:04 2010] ->3: #scan frames
[Fri Dec 10 11:40:04 2010] ->3: start main 30
Error ignored because "allow_tpman_connect_failure" is set in daqdrc file
Fri Dec 10 11:40:04 2010 [16362]:Allocated move buffer size 528014 bytes
[Fri Dec 10 11:40:04 2010] main started
[Fri Dec 10 11:40:04 2010] ->3: start profiler
[Fri Dec 10 11:40:04 2010] ->3: # comment out this block to stop saving data
[Fri Dec 10 11:40:04 2010] frame saver started
[Fri Dec 10 11:40:04 2010] ->3: start frame-saver
[Fri Dec 10 11:40:05 2010] ->3: sync frame-saver
[Fri Dec 10 11:40:05 2010] ->3: start trender
[Fri Dec 10 11:40:05 2010] trender started
[Fri Dec 10 11:40:05 2010] trend frame saver started
[Fri Dec 10 11:40:05 2010] ->3: start trend-frame-saver
[Fri Dec 10 11:40:06 2010] ->3: sync trend-frame-saver
[Fri Dec 10 11:40:06 2010] ->3: # dont' need these
[Fri Dec 10 11:40:06 2010] ->3: #start minute-trend-frame-saver
[Fri Dec 10 11:40:06 2010] ->3: #sync minute-trend-frame-saver
[Fri Dec 10 11:40:06 2010] raw minute trend frame saver started
[Fri Dec 10 11:40:06 2010] ->3: start raw-minute-trend-saver
[Fri Dec 10 11:40:06 2010] ->3: #start frame-writer "225.225.225.1" broadcast="131.215.113.0" all
[Fri Dec 10 11:40:06 2010] ->3: #sleep 5
[Fri Dec 10 11:40:06 2010] producer started
[Fri Dec 10 11:40:06 2010] ->3: start producer
[Fri Dec 10 11:40:06 2010] ->3: start epics dcu
[Fri Dec 10 11:40:06 2010] edcu started
[Fri Dec 10 11:40:06 2010] ->3: start epics server "C0:DAQ-FB0_" "C2:DAQ-FB0_"
[Fri Dec 10 11:40:06 2010] epics server started
[Fri Dec 10 11:40:06 2010] ->3: start listener 8087
[Fri Dec 10 11:40:06 2010] ->3: start listener 8088 1
[Fri Dec 10 11:40:06 2010] ->3: sleep 60
[Fri Dec 10 11:40:06 2010] EDCU has 175 channels configured; first=0

[Fri Dec 10 11:40:06 2010] Epics server started
[Fri Dec 10 11:40:06 2010] Opened /rtl_mem_atf_daq
cas warning: Configured TCP port was unavailable.
cas warning: Using dynamically assigned TCP port 49166,
cas warning: but now two or more servers share the same UDP port.
cas warning: Depending on your IP kernel this server may not be
cas warning: reachable with UDP unicast (a host's IP in EPICS_CA_ADDR_LIST)
[Fri Dec 10 11:40:06 2010] Opened /rtl_mem_psl_daq

[Fri Dec 10 11:40:07 2010] Waiting for DCU 10 to show Up
[Fri Dec 10 11:40:07 2010] Detected controller DCU 10

[Fri Dec 10 11:40:08 2010] Minute trender made GPS time correction; gps=976045221; gps%60=21

ELOG V3.1.3-