I heard a rumor about a DAQ problem at the 40m.
To investigate, I tried retrieving data from some channels under C1:SUS-AS1 on the c1sus2 front end. DQ channels worked fine, testpoint channels did not. This pointed to an issue involving the communication with awgtpman. However, AWG excitations did work. So the issue seemed to be specific to the communication between daqd and awgtpman.
daqd logs were complaining of an error in the tpRequest function: error code -3/couldn't create test point handle. (Confusingly, part of the error message was buffered somewhere, and would only print after a subsequent connection to daqd was made.) This message signifies some kind of failure in setting up the RPC connection to awgtpman. A further error string is available from the system to explain the cause of the failure, but daqd does not provide it. So we have to guess...
One of the reasons an RPC connection can fail is if the server name cannot be resolved. Indeed, address lookup for c1sus2 from fb1 was broken:
$ host c1sus2
Host c1sus2 not found: 3(NXDOMAIN)
In /etc/resolv.conf on fb1 there was the following line:
Changing this to
search martian got address lookup on fb1 working:
$ host c1sus2
c1sus2.martian has address 192.168.113.87
But testpoints still could not be retrieved from c1sus2, even after a daqd restart.
In /etc/hosts on fb1 I found the following:
Changing the hardcoded address to the value returned by the nameserver (192.168.113.87) fixed the problem.
It might be even better to remove the hardcoded addresses of front ends from the hosts file, letting DNS function as the sole source of truth. But a full system restart should be performed after such a change, to ensure nothing else is broken by it. I leave that for another time.