daqd is still behaving unstably. It's still unclear what the issue is.
The current failures look like disk IO contention. However, it's hard to see any evidince of daqd is suffering from large IO wait while it's failing.
The frame size itself is currently smaller than it was before the upgrade:
controls@fb /frames/full 0$ ls -alth 11190 | head
total 369G
drwxr-xr-x 321 controls controls 36K Jul 12 22:20 ..
drwxr-xr-x 2 controls controls 268K Jun 23 06:06 .
-rw-r--r-- 1 controls controls 67M Jun 23 06:06 C-R-1119099984-16.gwf
-rw-r--r-- 1 controls controls 68M Jun 23 06:06 C-R-1119099968-16.gwf
-rw-r--r-- 1 controls controls 69M Jun 23 06:05 C-R-1119099952-16.gwf
-rw-r--r-- 1 controls controls 69M Jun 23 06:05 C-R-1119099936-16.gwf
-rw-r--r-- 1 controls controls 67M Jun 23 06:05 C-R-1119099920-16.gwf
-rw-r--r-- 1 controls controls 68M Jun 23 06:05 C-R-1119099904-16.gwf
-rw-r--r-- 1 controls controls 68M Jun 23 06:04 C-R-1119099888-16.gwf
controls@fb /frames/full 0$ ls -alth 11208 | head
total 17G
drwxr-xr-x 2 controls controls 20K Jul 13 01:00 .
-rw-r--r-- 1 controls controls 45M Jul 13 01:00 C-R-1120809632-16.gwf
-rw-r--r-- 1 controls controls 50M Jul 13 01:00 C-R-1120809408-16.gwf
-rw-r--r-- 1 controls controls 50M Jul 13 00:56 C-R-1120809392-16.gwf
-rw-r--r-- 1 controls controls 50M Jul 13 00:56 C-R-1120809376-16.gwf
-rw-r--r-- 1 controls controls 50M Jul 13 00:56 C-R-1120809360-16.gwf
-rw-r--r-- 1 controls controls 50M Jul 13 00:55 C-R-1120809344-16.gwf
-rw-r--r-- 1 controls controls 50M Jul 13 00:55 C-R-1120809328-16.gwf
controls@fb /frames/full 0$
This would seem to indicate that it's not an increase in frame size that's to blame.
Because slow data is now transported to daqd over the MX data concentrator network rather than via EPICS (RTS 2.8), there is more network on the MX network. I note also that the channel lists have increased in size:
controls@fb /opt/rtcds/caltech/c1/chans/daq 0$ ls -alt archive/C1LSC* | head -20
-rw-r--r-- 1 4294967294 4294967294 262554 Jul 6 18:21 archive/C1LSC_150706_182146.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul 6 18:16 archive/C1LSC_150706_181603.ini
-rw-r--r-- 1 4294967294 4294967294 262554 Jul 6 16:09 archive/C1LSC_150706_160946.ini
-rw-r--r-- 1 4294967294 4294967294 43366 Jul 1 16:05 archive/C1LSC_150701_160519.ini
-rw-r--r-- 1 4294967294 4294967294 43366 Jun 25 15:47 archive/C1LSC_150625_154739.ini
...
I would have thought, though, that data transmission errors would show up in the daqd status bits. |