40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 42 of 344  Not logged in ELOG logo
ID Date Author Type Category Subjectup
  13940   Mon Jun 11 17:18:39 2018 poojaUpdateCamerasCCD calibration

Aim: To calibrate CCD of GigE using LED1050E.

The following table shows some of the specifications for LED1050E as given in Thorlabs datasheet.

Specifications Typical maximum ratings
DC forward current (mA)   100
Forward voltage (V) @ 20mA (VF) 1.25 1.55
Forward optical power (mW) 1.6  
Total optical power (mW) 2.5  
Power dissipation (mW)   130

 The circuit diagram is given in Attachment 1.

Considering a power supply voltage Vcc = 15V, current I = 20mA & forward voltage of led VF = 1.25V, resistance in the circuit is calculated as,

R = (Vcc - VF)/I = 687.5\ohm\ohms\Omega

Attachment 2 gives a plot of resistance (R) vs input voltage (Vcc) when a current of 20mA flows through the circuit. I hope I can proceed with this setup soon.

 

  13951   Tue Jun 12 19:27:25 2018 poojaUpdateCamerasCCD calibration

Today I made the led (1050nm) circuit inside a box as given in my previous elog. Steve drilled a 1mm hole in the box as an aperture for led light.

Resistance (R) used = 665 \Omega.

We connected a power supply and IR has been detected using the card.

Later we changed the input voltage and measured the optical power using a powermeter.

Input voltage (Vcc in V) Optical power
0 (dark reading) 60 nW
15 68 \muW
18 82.5 \muW
20 92 \muW

Since the optical power values are very less, we may need to drill a larger hole.

Now the hole is approximately 7mm from led, therefore aperture angle is approximately 2*tan-1(0.5/7) = 8deg. From radiometric curve given in the datasheet of LED1050E, most of the power is within 20 deg. So a hole of size 2* tan(10) *7 = 2.5mm may be required.

I have also attached a photo of the led beam spot on the IR detection card.

  14633   Thu May 23 10:18:39 2019 KruthiUpdateCamerasCCD calibration

On Tuesday, I tried reproducing Pooja's measurements (https://nodus.ligo.caltech.edu:8081/40m/13986). The table below shows the values I got. Pictures of LED circuit, schematic and the setup are attached. The powermeter readings fluctuated quite a bit for input volatges (Vcc) > 8V, therefore, I expect a maximum uncertainity of 50µW to be on a safer side. Though the readings at lower input voltages didn't vary much over time (variation < 2µW), I don't know how relaible the Ophir powermeter is at such low power levels. The optical power output of LED was linear for input voltages 10V to 20V. I'll proceed with the CCD calibration soon.

Input Voltage (Vcc) in volts Optical power
0 (dark reading) 1.6 nW
2 55.4 µW
4 215.9 µW
6 0.398 mW
8 0.585 mW
10 0.769 mW
12 0.929 mW
14 1.065 mW
16 1.216 mW
18 1.330 mW
20 1.437 mW
22 1.484 mW
24 1.565 mW
26 1.644 mW
28 1.678 mW

  14621   Sat May 18 12:19:36 2019 KruthiUpdate CCD calibration and telescope design

I went through all the elog entries related to CCD calibration. I was wondering if we can use Spectralon diffuse reflectance standards (https://www.labsphere.com/labsphere-products-solutions/materials-coatings-2/targets-standards/diffuse-reflectance-standards/diffuse-reflectance-standards/) instead of a white paper as they would be a better approximation to a Lambertian scatterer.

Telescope design:
On calculating the accessible u-v ranges and the % error in magnification (more precisely, %deviation), I got %deviation of order 10 and in some cases of order 100 (attachments 1 to 4), which matches with Pooja's calculations. But I'm not able reproduce Jigyasa's %error calculations where the %error is of order 10^-1. I couldn't find the code that she had used for these calculations and I even mailed her about the same. We can still image with 150-250 mm combination as proposed by Jigyasa, but I don't think it ensures maximum usage of pixel array. Also for this combination the resulting conjugate ratio will be greater than 5. So, use of plano-convex lenses will reduce spherical aberrations. I also explored other focal length combinations such as 250-500 mm and 500-500mm. In these cases, both the lenses will have f-numbers greater than 5. But the conjugate ratios will be less than 5, so biconvex lenses will be a better choice.

Constraints: available lens tube length (max value of d) = 3" ; object distances range (u) = 70 cm to 150 cm ; available cylindrical enclosures (max value of d+v) are 52cm and 20cm long (https://nodus.ligo.caltech.edu:8081/40m/13000).

I calculated the resultant image distance (v) and the required distance between lenses (d), for fixed magnifications (i.e. m = -0.06089 and m = -0.1826 for imaging test masses and beam spot respectively) and different values of 'u'. This way we can ensure that no pixels are wasted. The focal length combinations - 300-300mm (for imaging beam spot), and 100-125mm (for imaging test masses) - were the only combinations that gave all positive values for 'd' and 'v', for given range of 'u' (attachments 5-6). But here 'd' ranges from 0 to 30cm in first case, which exceeds the available lens tube length. Also, in the second case the f-numbers will be less than 5 for 2" lenses and thus may result in spherical aberration.

All this fuss about f-numbers, conjugate ratios, and plano-convex/biconvex lenses is to reduce spherical aberrations. But how much will spherical aberrations affect our readings? 

We have two 2" biconvex lenses of 150mm focal length and one 2" biconvex lens of focal length 250mm in stock. I'll start off with these and once I have a metric to quantify spherical aberrations we can further decide upon lenses to improve the telescopic lens system.

  13986   Tue Jun 19 14:08:37 2018 poojaUpdateCamerasCCD calibration using LED1050E

Aim: To measure the optical power from led using a powermeter.

Yesterday Gautam drilled a larger hole of diameter 5mm in the box as an aperture for led (aperture angle is approximately 2*tan-1(2.5/7) = 39 deg). I repeated the measurements that I had done before (https://nodus.ligo.caltech.edu:8081/40m/13951). The measurents of optical power measured using a powermeter and the corresponding input voltages are listed below.

Input voltage (Vcc in V) Optical power
0 (dark reading) 0.8 nW
10 1.05 mW
12 1.15 mW
15 1.47 mW
16 1.56 mW
18 1.81 mW

So we are able to receive optical power close to the value (1.6mW) given in Thorlabs datasheet for LED1050E (https://www.thorlabs.com/drawings/e6da1d5608eefd5c-035CFFE5-C317-209E-7686CA23F717638B/LED1050E-SpecSheet.pdf). I hope we can proceed to BRDF measurements for CCD calibration.

Steve: did you center the LED ?

  13991   Wed Jun 20 20:39:36 2018 poojaUpdateCamerasCCD calibration using LED1050E

 

Quote:

Aim: To measure the optical power from led using a powermeter.

Yesterday Gautam drilled a larger hole of diameter 5mm in the box as an aperture for led (aperture angle is approximately 2*tan-1(2.5/7) = 39 deg). I repeated the measurements that I had done before (https://nodus.ligo.caltech.edu:8081/40m/13951). The measurents of optical power measured using a powermeter and the corresponding input voltages are listed below.

Input voltage (Vcc in V) Optical power
0 (dark reading) 0.8 nW
10 1.05 mW
12 1.15 mW
15 1.47 mW
16 1.56 mW
18 1.81 mW

So we are able to receive optical power close to the value (1.6mW) given in Thorlabs datasheet for LED1050E (https://www.thorlabs.com/drawings/e6da1d5608eefd5c-035CFFE5-C317-209E-7686CA23F717638B/LED1050E-SpecSheet.pdf). I hope we can proceed to BRDF measurements for CCD calibration.

Steve: did you center the LED ?

Yes.

  8880   Fri Jul 19 12:23:34 2013 manasaUpdateCDSCDS FE not happy

I found CDS rt processes in red. I did 'mxstreamrestart' from the medm. It did not help. Also ssh'd into c1iscex and tried 'mxstreamrestart' from the command line. It did not work either.

I thought restarting frame builder would help. I ssh'd to fb. But when I try to restart fb I get the following error:

controls@fb ~ 0$ telnet fb 8088
Trying 192.168.113.202...
telnet: connect to address
192.168.113.202: Connection refused

 

Screenshot-Untitled_Window.png

  8881   Fri Jul 19 14:04:24 2013 KojiUpdateCDSCDS FE not happy

daqd was restarted.


- tried telnet fb 8088 on rossa => same error as manasa had

- tried telnet fb 8087 on rossa => same result

- sshed into fb ssh fb

- tried to find daqpd by ps -def | grep daqd => not found

- looked at wiki https://wiki-40m.ligo.caltech.edu/New_Computer_Restart_Procedures?highlight=%28daqd%29

- the wiki page suggested the following command to run daqd /opt/rtcds/caltech/c1/target/fb/daqd -c ./daqdrc &

- ran ps -def | grep nds => already exist. Left untouched.

- Left fb.

- tried telnet fb 8087 on rossa => now it works

  4770   Tue May 31 11:26:29 2011 josephbUpdateCDSCDS Maintenance

1) Checked in the changes I had made to the c1mcp.mdl model just before leaving for Elba.

2) The c1x01 and c1scx kernel modules had stopped running due to an ADC timeout. 

According to dmesg on c1iscex, they died at 3426838 seconds after starting (which corresponds to ~39 days).  "uptime" indicates c1iscex was up for 46 days, 23 hours. So my guess is about 8 days ago (last Monday or Tuesday),  they both died when the ADCs failed to respond quick enough for an unknown reason.

I used the kill scripts (in /opt/rtcds/caltech/c1/scripts/) to kill c1spx, c1scx, and c1x01.  I then used the start scripts to start c1x01, then c1scx, and then finally c1spx.  They all came up fine.

Status screen is now all green.  I renabled damping on ETMX and it seems to be happy. A small kick of the optic shows the approriately damped response.

  12148   Fri Jun 3 13:05:18 2016 ericqUpdateCDSCDS Notes

Some CDS related things:


Keith Thorne has told us about a potential fix for our framebuilder woes. Jamie is going to be at the 40m next week to implement this, which could interfere with normal interferometer operation - so plan accordingly. 


I spent a little time doing some plumbing in the realtime models for Varun's audio processing work. Specifically, I tried to spin up a new model (C1DAF), running on the c1lsc machine. This included:

  • Removing the unused TT3 and TT4 parts from the IOO block in c1ass.mdl, freeing up some DAC outputs on the LSC rack
  • Adding an output row to the LSC input matrix which pipes to a shared memory IPC block. (This seemed like the simplest way for the DAFI model to have access to lots of signals with minimal overhead).
  • Removing two unused ADC inputs from c1lsc.mdl (that went to things like PD_XXX), to give c1daf.mdl the required two ADC inputs - and to give us the option of feeding in some analog signals.
  • Editing the rtsystab file to include c1daf in the list of models that run on c1lsc
  • Editing the existing DAFI .mdl file (which just looked like an old recolored cut-n-paste of c1ioo.mdl) to accept the IPC and ADC connections, and one DAC output that would go to the fibox. 

The simple DAFI model compiled and installed without complaint, but doesn't succesfully start. For some reason, the frontend never takes the CPU offline. Jamie will help with this next week. Since things aren't working, these changes have not been commited to the userapps svn. 

  3963   Mon Nov 22 13:16:52 2010 josephbSummaryCDSCDS Plan for the week

CDS Objectives for the Week:

Monday/Tuesday:

1) Investigate ETMX SD sensor problems

2) Fully check out the ETMX suspension and get that to a "green" state.

3) Look into cleaning up target directories (merge old target directory into the current target directory) and update all the slow machines for the new code location.

4) Clean up GDS apps directory (create link to opt/apps on all front end machines).

5) Get Rana his SENSOR, PERROR, etc channels.

Tuesday/Wednesday:

3) Install LSC IO chassis and necessary cabling/fibers.

4) Get LSC computer talking to its remote IO chassis

Wednesday:

5) If time, connect and start debugging Dolphin connection between LSC and SUS machines

 

  16208   Thu Jun 17 11:19:37 2021 Ian MacMillanUpdateCDSCDS Upgrade

Jon and I tested the ADC and DAC cards in both of the systems on the test stand. We had to swap out an 18-bit DAC for a 16-bit one that worked but now both machines have at least one working ADC and DAC.

[Still working on this post. I need to look at what is in the machines to say everything ]

  16217   Mon Jun 21 17:15:49 2021 Ian MacMillanUpdateCDSCDS Upgrade

Anchal and I wrote a script (Attachment 1) that will test the ADC and DAC connections with inputs on the INMON from -3000 to 3000. We could not run it because some of the channels seemed to be frozen. 

  17175   Thu Oct 6 12:02:21 2022 AnchalUpdateCDSCDS Upgrade Plan

[Chris, Anchal]

Chris and I discussed our plan for CDS upgrade which amounts to moving new FEs, new chiara, and new FB1 OS system tomartian network.


Preparation:

  • Chiara (clone) (will be called "New Chiara" henceforth) will be resynced to existing chiara to get all model and medm changes.
  • All models on New Chiara will be rebuilt, and reinstalled.
  • All running servies on existing chiara will be printed and stored for comparison later.
  • New Chiara's OS drive will be updraged to Debian 10 and all services will be restored:
    • DHCP
    • DNS
    • NFS
    • rsync
  • Existing fb1 DAQ network card (10 GBps ethernet card) will be verified.
  • Make a list of all fb1 file system mounts and their UUIDs.

Upgrade plan:

Date: Fri Oct 7, 2022
Time: 11:00 am (After coffee and donuts)
Minimum required people: Chris, Anchal, JC (the more the merrier)

Steps:

  1. Ensure a snapshot of all channels is present from Oct 6th on New Chiara.
  2. Shutdown all machines:
    1. All slow computers (Except c1vac).
      Computer List: ssh into the computers and run:
      sudo systemctl stop modbusIOC.service
      sudo shutdown -h now
      1. c1susaux
      2. c1susaux2
      3. c1auxex
      4. c1auxey1
      5. c1psl
      6. c1iscaux
    2. All fast computers. Run on rossa:
      /cvs/cds/rtcds/caltech/c1/Git/40m/scripts/cds/stopAllModels.sh
      Disconnect left ethernet cables from the back of these computers.
    3. Power off all I/O chassis
    4. Swap the oneStop cables on all I/O chassis to fiber cables. On c1sus, connect the copper oneStop cable to teststand c1sus FE.
    5. Tun on all I/O chassis.
  3. Exchange chairas.
    1. Connect old chiara to teststand network.
    2. Connect New Chiara to martian network.
    3. Turn on both old and new chiara.
    4. Ensure all services are running on New Chiara by comparing with the list made earlier during preparation.
  4. fb1.
    1. Move fb1(clone)'s OS drive into existing fb1 (on 1X6)
    2. Turn on fb1 (on 1X6).
    3. Ensure fb1 is mounting all it's file systems correctly.
  5. New FEs
    1. Connect the network switch for new FEs to martian network. Make sure that old chiara is not connected to this same switch.
    2. Turn on the new FEs. All models should start on boot in sequence.
    3. Check if all models have green lights.
  6. Burt restore using latest snapshot available.
  7. Perform tests:
    1. Check if all local damping loops are working as before.
    2. Check if all IPC channels are transmitting and receiving correctly.
    3. Check if IMC is able to lock.
    4. Try single arm locking
    5. Try MICH locking.
  8. Make contingency plan on how to revert to old system if something fails.
  17178   Fri Oct 7 22:45:15 2022 AnchalUpdateCDSCDS Upgrade Status Update

[Chris, Anchal, JC, Paco, Yuta]

Quote:

Steps:

  1. Ensure a snapshot of all channels is present from Oct 6th on New Chiara.
  2. Shutdown all machines:
    1. All slow computers (Except c1vac).
      Computer List: ssh into the computers and run:
      sudo systemctl stop modbusIOC.service
      sudo shutdown -h now
      1. c1susaux
      2. c1susaux2
      3. c1auxex
      4. c1auxey1
      5. c1psl
      6. c1iscaux
    2. All fast computers. Run on rossa:
      /cvs/cds/rtcds/caltech/c1/Git/40m/scripts/cds/stopAllModels.sh
      Disconnect left ethernet cables from the back of these computers.
    3. Power off all I/O chassis
    4. Swap the oneStop cables on all I/O chassis to fiber cables. On c1sus, connect the copper oneStop cable to teststand c1sus FE.
    5. Tun on all I/O chassis.
  3. Exchange chairas.
    1. Connect old chiara to teststand network.
    2. Connect New Chiara to martian network.
    3. Turn on both old and new chiara.
    4. Ensure all services are running on New Chiara by comparing with the list made earlier during preparation.

We finished all steps upto step 3 without any issue. We restarted all workstations to get the new nfs mount from New Chiara. Some other machined in lab might requrie restart too if they require nfs mounts. Note, c1sus was initially connected using a fiber oneStop cable that tested OK with the teststand IO chassis, but it still did not work with the c1sus chassis, and was reverted to a copper cable.


[Chris, Anchal, JC]

Quote:
  • fb1.
    1. Move fb1(clone)'s OS drive into existing fb1 (on 1X6)
    2. Turn on fb1 (on 1X6).
    3. Ensure fb1 is mounting all it's file systems correctly.

While doing step 4, we realized that all 8 drive bays in the existing fb1 are occupied by disks that are managed by a hardware RAID controller (MegaRAID). All 8 hard disks seem to be combined into a single logical volume, which is then partitioned and appears to the OS as a 2 TB storage device (/dev/sda for OS) and 23.5 TB storage device (/dev/sdb for frames). There was no free drive bay to install our OS drive from fb1 (clone), nor was there any already installed drive that we could identify as an "OS drive" and swap out, without affecting access to the frame data. We tried to boot fb1 with the OS drive from fb1 (clone) using multiple SATA to USB cables, but it was not detected as a bootable drive. We then tried to put the OS drive back in fb1 (clone) and use the clone machine as the 40m framebuilder temporarily, in order to work on booting up fb1 in parallel with the rest of the upgrade. We found that fb1 (clone) would no longer boot from the drive either, as it had apparently lost (or never possessed?) its grub boot loader. The boot loader was reinstalled from the debian 10 install thumbdrive, and then fb1 (clone) booted up and functioned normally, allowing the remainder of the upgrade to go forward.


[Chris, Jamie]

Jamie investigated the situation with the existing fb1, and found that there seem to be additional drive bays inside the fb1 chassis (not accessible from the front panel), in which the new OS disk could be installed and connected to open SATA ports on the motherboard. We can try this possible route to booting up fb1 and restoring access to past frames next week.


[Chris, Anchal]

Quote:
 

Steps:

  • New FEs
    • Connect the network switch for new FEs to martian network. Make sure that old chiara is not connected to this same switch.
    • Turn on the new FEs. All models should start on boot in sequence.
    • Check if all models have green lights.
  • Burt restore using latest snapshot available.
  • Perform tests:
    • Check if all local damping loops are working as before.
    • Check if all IPC channels are transmitting and receiving correctly.
    • Check if IMC is able to lock.

We carried out the rest of the steps upto 7.3. We started all slow machines, some of them required reloading the daemons using:

sudo systemctl daemon-reload
sudo systemctl restart modbusIOC

We found that we were unable to ssh to c1psl, c1susaux, and c1iscaux. It turned out that chiara (clone) had a very outdated martian host table for the nameserver. Since Chris had introduced some new IPs for IPMI ports, dolphin switch etc, we could not simply copy back from the old chiara. So Chris used diff command to go through all changes and restored DNS configuration.

We were able to burt restore to Oct 7, 03:19 am point using the latest snapshot on New Chiara. All suspensions were being locally damped properly. We restarted megatron and optimus to get nfs mounts. All docker services are running normally, IMC autolocker is working and FF slow PID is working as well. PMC autolocker is also working fine. megatron's autourt cron job is running properly and restarted creating snapshots from 6:19 pm onwards.


Remaining things to do:

  • Test basic IFO locking
  • Resume BHD commissioning activities.
  • Chris and Jamie would work on transfering fb1 job to real fb1. This would restore access to all past frames which is not available right now.
  • Eventually, move the new FEs to 1X7 for permanent move into new CDS system.
  • After a few weeks of succesful run, we can remove old FEs from racks and associated cables.
  17181   Mon Oct 10 10:14:05 2022 ChrisUpdateCDSCDS Upgrade remaining issues

List of remaining tasks to iron out various wrinkles with the upgrade:

The situation with the timing system is that we have not touched it at all in the upgrade, but have added a new diagnostic: Spectracom timing boards in each FE, to compare vs the ADC clock. So I expect that what we’re seeing now is not new, but may not have been noticed previously.

What we’re seeing is:

  • Timing status in the IOP statewords is red
  • The offset between FE timing and Spectracom is large: ~1000 µsec or greater. At the sites, this is typically much smaller, like 10 µsec
  • The offset is not stable (see first attachment). There are both excursions where it wanders off by tens of µsec and then comes back, as well as slips where it runs away by hundreds of µsec before settling down at a different level.

Possible issues stemming from unstable timing:

  • One of the timing excursions apparently glitched the ADC clock on c1ioo and c1sus2. Those IOPs are currently running with cycle time >15 µsec (see second attachment). This may mean the inputs on certain ADCs are delayed by an extra sample time, relative to the other ADCs
  • If the offset drifts too far, a discrepancy in the timestamps can glitch the IPCs and daqd
  17182   Tue Oct 11 10:58:36 2022 ChrisUpdateCDSCDS Upgrade remaining issues

The original fb1 now boots from its new drive, which is installed in a fixed drive bay and connected via SATA. There are no spare SATA power cables inside the chassis, so we’re temporarily powering it from an external power supply borrowed from a USB to SATA kit (see attachment).

The easiest way to eliminate the external supply would be to use a 4-pin Molex to SATA adapter, since the chassis has plenty of 4-pin Molex connectors available. Unfortunately those adapters sometimes start fires. The ones with crimped connectors are thought to be safer than those with molded connectors. However, the safest course will probably be to just migrate the OS partition onto the 2 TB device from the hardware RAID array, once we’re happy with it.

Historic data from the past frames should now be available, as should NDS2 access.

Starting to look into the timing issue:

  • The GPS receiver (EndRun Tempus LX) seems to have low signal from its antenna: it’s sometimes losing lock with the satellites. We should find a way to log the signal strength and locked state of this receiver in EPICS and frames (perhaps using its SNMP interface).
  • There was a BNC tee attached to the 1PPS output of the receiver. One cable was feeding the timing system, and another one went somewhere into the PSL (if I traced it correctly?). I removed the tee and connected the timing system directly to 1PPS.
  • I updated the firmware on the Tempus LX (but don’t expect this to make any difference)
  17183   Tue Oct 11 11:27:58 2022 KojiUpdateCDSCDS Upgrade remaining issues

Perhaps 1PPS around the PSL was used for the Rb standard to be locked to the GPS 1PPS.

If we need to drive multiple devices, we should use a fanout circuit to avoid distorting the 1PPS. -CW

  3127   Mon Jun 28 12:48:04 2010 josephbSummaryCDSCDS adapter board notes

The following is according to the drawing by Ben Abbott found at http://www.ligo.caltech.edu/~babbott/40m_sus_wiring.pdf

This applies to SUS:

Two ICS 110Bs.  Each has 2 (4 total) 44 shielded conductors going to DAQ Interface Chassis  (D990147-A).  See pages 2 and 4.

Three Pentek 6102 Analog Outputs to LSC Anti-Image Board (D000186 Rev A).  Each connected via 40 conductor ribbon cable (so 3 total). See page 5.

Eight XY220 to various whitening and dewhitening filters.  50 conductor ribbon cable for each (8 total). See page 10.

Three Pentek 6102 Analog Input to Op Lev interface board. 40 conductor ribbon cable for each (3 total).  See page 13.

 

The following look to be part of the AUX crate, and thus don't need replacement:

Five VMIC113A to various Coil Drives, Optical Levers, and Whitening boards.  64 conductor ribbon cable for each (5 total). See page 11.

Three XY220 to various Coil boards. 50 conductor ribbon for each (3 total).  See page 11.

The following is according to the drawing by Jay found at http://www.ligo.caltech.edu/~jay/drawings/d020006-03.pdf

This applies to WFS and LSC:

Two XY220 to whitening 1 and 2 boards.  50 conductor ribbon for each (2 total).  See page 3.

Pentek 6102 to LSC Anti-image. 50 conductor ribbon. (1 total). See page 5.

 

The following are unclear if they belong to the FE or the Aux crate.  Unable to check the physical setup at the moment.

One VMIC3113A to LSC I & Q, RFAM, QPD INT. 64 conductor ribbon cable. (Total 1).  See page 4.

One XY220 to QPD Int.  50 conductor ribbon cable. (Total 1). See page 4.

 

The following look to be part of WFS, and aren't needed:

Two Pentek 6102 Analog Input to WFS boards. 40 conductor ribbon cables (2 Total). See page 1.

The following are part of the Aux crate, and don't need to be replaced:

Two VMIC3113A to Demods, PD, MC servo amp, PZT driver, Anti-imaging board. 64 conductor ribbon cable (2 Total). See page 3.

Two XY220 to Demods, MC Servo Amp, QPD Int boards.  50 conductor ribbon cable (2 Total). See page 3.

Three VMIC4116 to Demod and whitening boards.  50 conductor ribbon cable (3 Total). See page 3.

  3129   Mon Jun 28 21:26:05 2010 ranaSummaryCDSCDS adapter board notes

Those drawings are an OK start, but its obvious that things have changed at the 40m since 2002. We cannot rely on these drawings to determine all of the channel counts, etc.

I thought we had already been through all this...If not, we'll have to spend one afternoon going around and marking it all up. 

  3156   Fri Jul 2 11:06:38 2010 josephb, kiwamuUpdateCDSCDS and Green locking thoughts

Kiwamu and I went through and looked at the spare channels available near the PSL table and at the ends.

First, I noticed I need another 4 DB37 ADC adapter box, since there's 3 Pentek ADCs there, which I don't think Jay realized.

PSL Green Locking

Anyways, in the IOO chassis that will put in, for the ADC we have a spare 8 channels which comes in the DB37 format.  So one option, is build a 8 BNC converter, that plugs into that box.

The other option, is build 4-pin Lemo connectors and go in through the Sander box which currently goes to the 110B ADC, which has some spare channels.

For DAC at the PSL, the IOO chassis will have 8 spare channel DAC channels since there's only 1 Pentek DAC.  This would be in a IDC40 cable format, since thats what the blue DAC adapter box takes.  A 8 channel DAC box to 40 pin IDC would need to be built.

 

End Green Locking

The ends have 8 spare DAC channels, again 40 pin IDC cable.   A box similar to the 8 channel DAC box for the PSL would need to be built.

The ends also have spare 4-pin Lemo capacity.  It looked like there were 10 channels or so still unused.  So lemo connections would need to be made.  There doesn't appear to be any spare 37 DB connectors on the adapter box available, so lemo via the Sander box is the only way.

 

Notes

Joe needs to provide Kiwamu with cabling pin outs.

If Kiwamu makes a couple spares of the 8 BNC to 37DB connector boards, there's a spare 37DB ADC input in the SUS machine we could use up, providing 8 more channels for test use.

  13837   Sun May 13 15:15:18 2018 gautamUpdateGeneralCDS crash

I found the c1lsc machine to be completely unresponsive today. Looking at the trend of the state word, it happened sometime yesterday (Saturday). The usual reboot procedure did not work - I am not able to bring back any of the models on any of the machines, during the restart procedure, they all fail. The logfile reads (for the c1ioo front end, but they all behave the same):

[  309.783460] c1x03: Initializing space for daqLib buffers
[  309.887357] CPU 2 is now offline
[  309.887422] c1x03: Sync source = 4
[  309.887425] c1x03: Waiting for EPICS BURT Restore = 2
[  309.946320] c1x03: Waiting for EPICS BURT 0
[  309.946320] c1x03: BURT Restore Complete
[  309.946320] c1x03: Corrupted Epics data:  module=0 filter=1 filterType=0 filtSections=134610112
[  309.946320] c1x03: Filter module init failed, exiting
[  363.229086] c1x03: Setting stop_working_threads to 1
[  364.232148] DXH Adapter 0 : BROADCAST - dx_user_mcast_unbind - mcgroupid=0x3
[  364.233689] Will bring back CPU 2
[  365.236674] Booting Node 1 Processor 2 APIC 0x2
[  365.236771] smpboot cpu 2: start_ip = 9a000
[  309.946320] Calibrating delay loop (skipped) already calibrated this CPU
[  365.251060] NMI watchdog enabled, takes one hw-pmu counter.
[  365.252135] Brought the CPU back up
[  365.252138] c1x03: Just before returning from cleanup_module for c1x03

Not sure what is going on here, or what "Corrutped EPICS data" is supposed to mean. Thinking that something was messed up the last time the model was compiled, I tried recompiling the IOP model. But I'm not able to even compile the model, it fails giving the error message

make[1]: Leaving directory '/opt/rtcds/caltech/c1/rtbuild/3.4'
make[1]: /cvs/cds/rtapps/epics-3.14.12.2_long/modules/seq/bin/linux-x86_64/snc: Command not found
make[1]: *** [build/c1x03epics/c1x03.c] Error 127
Makefile:28: recipe for target 'c1x03' failed
make: *** [c1x03] Error 1

I suspect this is some kind of path problem - the EPICS_BASE bash variable is set to /cvs/cds/rtapps/epics-3.14.12.2_long/base on the FEs, while /cvs isn't even mounted on the FEs (nor do I think it should be). I think the correct path should be /opt/rtapps/epics-3.14.12.2_long/base. Why should this have changed?

I've shutdown all watchdogs until this is resolved.

  13838   Sun May 13 17:31:51 2018 gautamUpdateGeneralCDS crash

As suspected, this was indeed a path problem. Johannes will elog about it later, but in short, it is related to some path variables being changed in order to try and streamline the EPICS processes on the new c1auxex machine (Acromag Era). It is confusing that futzing around with the slow computing system messes with the realtime system as well - aren't these supposed to be decoupled? Once the paths were restored by Johannes, everything compiled and restarted fine. We even have a beam on the AS camera, which was what triggered this whole thingyes.

Anyways, Attachment #1 shows the current status. I am puzzled by the red TIMING indicators on the c1x04 and c1x02 processes, it is absent from any other processes. How can this be debugged further?

Quote:
 

I suspect this is some kind of path problem - the EPICS_BASE bash variable is set to /cvs/cds/rtapps/epics-3.14.12.2_long/base on the FEs, while /cvs isn't even mounted on the FEs (nor do I think it should be). I think the correct path should be /opt/rtapps/epics-3.14.12.2_long/base. Why should this have changed?

  13839   Sun May 13 20:48:38 2018 johannesUpdateGeneralCDS crash

I think the root of the problem is that the /opt/rtapps/ and /cvs/cds/rtapps/ mounting locations point to the same directory on the nfs server. Gautam and I were cleaning up the /cvs/cds/caltech/target/ directory, placing the previous contents of /cvs/cds/caltech/target/c1auxex/, including database files and startup instructions in /cvs/cds/caltech/target/c1auxex_oldVME/, and then moved /cvs/cds/caltech/target/c1auxex2/, which has the channel database and initialization files for the Acromac DAQ, to /cvs/cds/caltech/target/c1auxex/.

This also required updating the systemd entries on c1auxex to point to the changed directory. While confirming that everything worked as before we noticed that upon startup the EPICS IOC complains about not being able to find the caRepeater binary. This was not new and has not limited DAQ functionality in the past, but we wanted to fix this, as it seemed to be some simple PATH issue. While the paths are all correctly defined in the user login shell, systemd runs on a lower level and doesn't know about them. One thing we tried was to let systemd execute /cvs/cds/rtapps/epics-3.14.12.2_long/etc/epics-user-env.sh initializing EPICS. It was strange that the content of that file was pointing to /opt/rtapps/epics-3.14.12.2_long/base, which is not mounted on the slow machines, so we changed the /opt/ it to /cvs/cds/, not realizing that the frontends read from the same directory (as Gautam said, /cvs/cds does not exist as a mount point on the frontend). It ended up not working this way, and apparently I forgot to change it back during clean up. But worse, never elogged it!

In the end, we managed to to give systemd the correct path definitions by explicitly calling them out in /cvs/cds/caltech/target/c1auxex/ETMXenv, to which a reference was added in the systemd service file. The caRepeater warning no longer appears.

  15791   Tue Feb 2 23:29:35 2021 KojiUpdateCDSCDS crash and CDS/IFO recovery

I worked around the racks and the feedthru flanges this afternoon and evening. This inevitably crashed c1lsc real-time process.
Rebooting c1lsc caused multiple crashes (as usual) and I had to hard reboot c1lsc/c1sus/c1ioo
This made the "DC" indicator of the IOPs for these hosts **RED**.

This looked like the usual timing issue. It looked like "ntpdate" is not available in the new system. (When was it updated?)

The hardware clock (RTC) of these hosts are set to be PST while the functional end host showed UTC. So I copied the time of the UTC time from the end to the vertex machines.
For the time adjustment, the standard "date" command was used

> sudo date -s "2021-02-03 07:11:30"

This made the trick. Once IOP was restarted, the "DC" indicators returned to **Green**, restarting the other processes were straight forward and now the CDS indicators are all green.

controls@c1iscex:~ 0$ timedatectl
      Local time: Wed 2021-02-03 07:35:12 UTC
  Universal time: Wed 2021-02-03 07:35:12 UTC
        RTC time: Wed 2021-02-03 07:35:26
       Time zone: Etc/UTC (UTC, +0000)
     NTP enabled: yes
NTP synchronized: no
 RTC in local TZ: no
      DST active: n/a

NTP synchronization is not active. Is this OK?


With the recovered CDS, the IMC was immediately locked and the autolocker started to function after a few pokes (like manually running of the "mcup" script). However, I didn't see any light on the AS/REF cameras as well as the test mass faces. I'm sure the IMC alignment is OK. This means the TTs are not well aligned.

So, burtrestored c1assepics with 12:19 snapshot. This immediately brought the spots on the REFL/AS.

Then the arm were aligned, locked, and ASSed. I tried to lock the FP arms. The transmissions were at the level of 0.1~0.3. So some manual alignment of ITMY and BS were necessary. After having the TRs of ~0.8, I still could not lock the arms. The signs of the servo gains were flipped to -0.143 for X arm and -0.012 for Y arm, and the arms were locked. ASS worked well and the ASS offsets were offloaded to the SUSs.

 

  15792   Wed Feb 3 15:24:52 2021 gautamUpdateCDSCDS crash and CDS/IFO recovery

Didn't get a chance to comment during the meeting - This was almost certainly a coincidence. I have never had to do this - I assert, based on the ~10 labwide reboots I have had to do in the last two years, that whether the timing errors persist on reboot or not is not deterministic. But this is beyond my level of CDS knowledge and so I'm happy for Rolf / Jamie to comment. I use the reboot script - if that doesn't work, I use it again until the systems come back without any errors.

Quote:

This looked like the usual timing issue. It looked like "ntpdate" is not available in the new system. (When was it updated?)

The hardware clock (RTC) of these hosts are set to be PST while the functional end host showed UTC. So I copied the time of the UTC time from the end to the vertex machines.
For the time adjustment, the standard "date" command was used

> sudo date -s "2021-02-03 07:11:30"

This made the trick. Once IOP was restarted, the "DC" indicators returned to **Green**, restarting the other processes were straight forward and now the CDS indicators are all green.

I don't think this is a problem, the NTP synchronization is handled by timesyncd now.

Quote:

NTP synchronization is not active. Is this OK?

I defer restoring the LSC settings etc since I guess there is not expected to be any interferometer activity for a while.

  15794   Wed Feb 3 18:53:31 2021 KojiUpdateCDSCDS crash and CDS/IFO recovery

Really!? I didn't reboot the machines between "sudo date" and "rtcds start c1x0*". I tried rtcds. If it didn't work, it used date. Then tried rtcds. (repeat) The time was not synched at all wrt the time zones and also the time. There were 1~3 sec offset besides the TZ problem.

 

  15795   Wed Feb 3 21:28:02 2021 gautamUpdateCDSCDS crash and CDS/IFO recovery

I am just reporting my experience - this may be a new failure mode but I don't think so. In the new RTCDS, the ntp server for the FEs are the FB, to which they are synced by timesyncd. The FB machine itself has the ntpd service installed, and so is capable of synching to an NTP server over the internet, but also serving as an NTP server for the FEs. The timesyncd daemon may not have started correctly, or the NTP serving from FB got interrupted (for example), but that's all just speculation.

  13198   Fri Aug 11 19:34:49 2017 JamieUpdateCDSCDS final bits status update

So it appears we now have full frames and second, minute, and minute_raw trends.

We are still not able to raise test points with daqd_rcv (e.g. the NDS1 server), which is why dataviewer and nds2-client can't get test points on their own.

We were not able to add the EDCU (EPICS client) channels without daqd_fw crashing.

We have a new kernel image that's supposed to solve the module unload instability issue.  In order to try it we'll need to restart the entire system, though, so I'll do that on Monday morning.

I've got the CDS guys investigating the test point and EDCU issues, but we won't get any action on that until next week.

Quote:

Remaining unresolved issues:

  • IFO needs to be fully locked to make sure ALL components of all models are working.
  • The remaining red status lights are from the "FB NET" diagnostics, which are reflecting a missing status bit from the front end processes due to the fact that they were compiled with an earlier RCG version (3.0.3) than the mx_streams were (3.3+/trunk).  There will be a new release of the RTS soon, at which point we'll compile everything from the same version, which should get us all green again.
  • The entire system has been fully modernized, to the target CDS reference OS (Debian jessie) and more recent RCG versions.  The management of the various RTS components, both on the front ends and on fb, have as much as possible been updated to use the modern management tools (e.g. systemd, udev, etc.).  These changes need to be documented.  In particular...
  • The fb daqd process has been split into three separate components, a configuration that mirrors what is done at the sites and appears to be more stable: The "target" directory for all of these components is now:
    • daqd_dc: data concentrator (receives data from front ends)
    • daqd_fw: receives frames from dc and writes out full frames and second/minute trends
    • daqd_rcv: NDS1 server (raises test points and receives archive data from frames from 'nds' process)
    The "target" directory for all of these new components is:
    • /opt/rtcds/caltech/c1/target/daqd
    All of these processes are now managed under systemd supervision on fb, meaning the daqd restart procedure has changed.  This needs to be simplified and clarified.
  • Second trend frames are being written, but for some reason they're not accessible over NDS.
  • Have not had a chance to verify minute trend and raw minute trend writing yet.  Needs to be confirmed.
  • Get wiper script working on new fb.
  • Front end RTS kernel will occaissionally crash when the RTS modules are unloaded.  Keith Thorne apparently has a kernel version with a different set of patches from Gerrit Kuhn that does not have this problem.  Keith's kernel needs to be packaged and installed in the front end diskless root.
  • The models accessing the dolphin shared memory will ALL crash when one of the front end hosts on the dolphin network goes away.  This results in a boot fest of all the dolphin-enabled hosts.  Need to figure out what's going on there.
  • The RCG settings snapshotting has changed significantly in later RCG versions.  We need to make sure that all burt backup type stuff is still working correctly.
  • Restoration of /frames from old fb SCSI RAID?
  • Backup of entirety of fb1, including fb1 root (/) and front end diskless root (/diskless)
  • Full documentation of rebuild procedure from Jamie's notes.
  4323   Fri Feb 18 13:41:22 2011 josephbUpdateCDSCDS fixes

I talked to Alex today and had two things fixed:

First the maximum length of filter names (in the foton C1SYS.txt files in /chans) has been increased to 40, from 20.  This does not increase EPICS channel name length (which is longer than 20 anyways).

This should prevent running into the case where the model doesn't complain when compiled, but we can't load filters.

Additionally, we modified the feCodeGen.pl script in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util/ to correctly generate names for filters in all cases.  There was a problem where the C1 was being left off the file name when in the simulink .mdl file the filter was located in a box which had "top_names"  set.

  13729   Thu Apr 5 10:38:38 2018 gautamUpdateCDSCDS puzzle

I'm probably doing something stupid - but I've not been able to figure this out. In the MC1 and MC3 coil driver filter banks, we have a digital "28HzELP" filter module in FM9. Attachment #1 shows the MC1 filterbanks. In the shown configuration, I would expect the only difference between the "IN1" and "OUT" testpoints to be the transfer function of said ELP filter, after all, it is just a bunch of multiplications by filter coefficients. But yesterday looking at some DTT traces, their shapes looked suspicious. So today, I did the analysis entirely offline (motivation being to rule out DTT weirdness) using scipy's welch. Attachment #2 shows the ASDs of the IN1 and OUT testpoint data (collected for 30s, fft length is set to 2 seconds, and hanning window from scipy is used). I've also plotted the "expected" spectral shape, by loading the sos coefficients from the filter file and using scipy to compute the transfer function.

Clearly, there is a discrepancy for f>20Hz. Why?

Code used to generate this plot (and also a datafile to facilitate offline plotting) is attached in the tarball Attachment #3. Note that I am using a function from my Noise Budget repo to read in the Foton filter file...

*ChrisW suggested ruling out spectral leakage. I re-ran the script with (i) 180 seconds of data (ii) fft length of 15 seconds and (iii) blackman-harris window instead of Hanning. Attachment #4 shows similar discrepancy between expectation and measurement...

  13733   Fri Apr 6 10:00:29 2018 gautamUpdateCDSCDS puzzle
Quote:

Clearly, there is a discrepancy for f>20Hz. Why?

Spectral leakage

  2911   Tue May 11 16:38:16 2010 josephb,rana,rolfUpdateCDSCDS questions and thoughts

1) What is c1asc doing?  What is ascaux used for?  What are the cables labeled "C1:ASC_QPD" in the 1X2 rack really going to?

2) Put the 4600 machine (megatron) in the 1Y3 (away from the analog electronics)  This can be used as an OAF/IO machine.  We need a dolphin fiber link from this machine to the IO chassis which will presumably be in 1Y1, 1Y2 (we do not currently have this fiber at the 40m, although I think Rolf said something about having one).

3) Merge the PSL and IOOVME crates in 1Y1/1Y2 to make room for the IO chassis.

4) Put the LSC and SUS machines into 1Y4 and/or 1Y5 along with the SUS IO chassis.  The dolphin switch would also go here.

5) Figure out space in 1X3 for the LSC chassis.  Most likely option is pulling asc or ascaux stuff, assuming its not really being used.

6) Are we going to move the OMC computer out from under the beam tube and into an actual rack?  If so, where?

 

Rolf will likely be back Friday, when we aim to start working on the "New" Y end and possibly the 1X3 rack for the LSC chassis.

 

  13669   Thu Mar 8 01:10:22 2018 gautamUpdateGeneralCDS recovery after work at LSC rack

This required multiple hard reboots, but seems like all the RT models are back for now. The only indicator I can't explain is the red DC field on c1oaf. Also, the SUS model seems to be overclocking more frequently than usual, though I can't be sure. The "timing" field of this model's state word is RED, while the other models all seem fine. Not sure what could be going on.

Will debug further tomorrow, when I probably will have to do all this again as I'll need to recompile c1lsc for the ALS electronics test with the new ADC card from the differential AA board.

  13670   Thu Mar 8 14:41:25 2018 gautamUpdateGeneralCDS recovery after work at LSC rack

As I had found before, restarting the c1oaf model fixed the DC error. There is however still a pesky red indicator light on the "ADC0" in c1oaf. Trying to open up the ADC MEDM screen to investigate this further leads to the blank screen on the bottom right of Attachment #1. Probably has something to do with the fact that the model has an ADC block (because every model needs one?) but no signals are actually being piped to the model directly from the ADC.

Another observation, though I don't have any hypothesis as to why this was happening: on the c1sus machine, the c1sus model would frequently overclock, and then eventually, crash. I observed this behaviour at least 3 times between last night and now. The other models seemed fine though, in fact, IMC stayed locked. Why should this have been the case? It remains to be seen if this was somehow connected to the red DC indicator on c1oaf, though why should this be the case? Isn't the DC just concerned with writing data to frames? Any sort of IPC should be independent? Attachment #2 shows that there's been a definite increase in the maximum time on c1sus clock-cycle since yesterday (it's a 10 day minute trend plot of the model clock cycle timing and also the maximum time). Why? Koji and I did switch off all the Sorensens at the LSC rack for about 30mins, but why should this affect anything at 1X6? There are no red lights in either the c1lsc or c1sus expansion chassis. Curiously, the PRM also seems to be glitchy - as I'm sitting in the control room, I see a spot flashing across vertically on the REFL CRT monitor sporadically. Note that nominally, with PRM misaligned, the REFL CRT should be dark. dmesg on c1sus doesn't shed any light on the issue.

Seems like some high level voodoo indecision.


Edit 330pm: The model just crashed again. dmesg rather unhelpfully just says "ADC timeout". Unclear how to debug further. See Attachment #3.

Quote:

This required multiple hard reboots, but seems like all the RT models are back for now. The only indicator I can't explain is the red DC field on c1oaf. Also, the SUS model seems to be overclocking more frequently than usual, though I can't be sure. The "timing" field of this model's state word is RED, while the other models all seem fine. Not sure what could be going on.

Will debug further tomorrow, when I probably will have to do all this again as I'll need to recompile c1lsc for the ALS electronics test with the new ADC card from the differential AA board.

  13672   Thu Mar 8 18:15:42 2018 gautamUpdateGeneralCDS recovery after work at LSC rack

I was forced into a simultaneous power-cycle rebooting of the three vertex FEs just now. I took the opportunity to completely disconnect the c1sus expansion chassis from all power and then restart it.

Everything is back up right now, and the weird timing issues I noticed in the sus model seem to be gone now (I'll need a longer baseline to be sure and I'll post a trend of the CPU timing tomorrow). It's disconcerting that apparently the only way to get everything back up and running is the nuclear option of power-cycling all FE related electronics. I was considering borrowing an ADC adapter card from the Y end and measuring the calibrated IR ALS noise with the digital system, but if I'm going to have to go through this whole dance each time I do a model recompile on c1lsc (which I'm going to have to in order to get the extra ADC recognized), I'm wondering if it's just better to wait till we get the new adapter cards we ordered. I think I'm going to work on tuning the input coupling into the fiber at EX in the next couple of days instead.

Quote:
 

Seems like some high level voodoo indecision.


Edit 330pm: The model just crashed again. dmesg rather unhelpfully just says "ADC timeout". Unclear how to debug further. See Attachment #3.

 

  13477   Thu Dec 14 19:41:00 2017 gautamUpdateCDSCDS recovery, NFS woes

[Koji, Jamie(remote), gautam]

Summary: The CDS system seems to be back up and functioning. But there seems to be some pending problems with the NFS that should be looked into.

We locked Y-arm, hand aligned transmission to 1yes. Some pending problems with ASS model (possibly symptomatic of something more general). Didn't touch Xarm because we don't know what exactly the status of ETMX is.

Problems raised in elogs in the thread of 13474 and also 13436 seem to be solved.


I would make a detailed post with how the problems were fixed, but unfortunately, most of what we did was not scientific/systematic/repeatable. Instead, I note here some general points (Jamie/Koji can addto /correct me):

  1. There is a "known" problem with unloading models on c1lsc. Sometimes, running rtcds stop <model> will kill the c1lsc frontend.
  2. Sometimes, when one machine on the dolphin network goes down, all 3 go down.
  3. The new FB/RCG means that some of the old commands now no longer work. Specifically, instead of telnet fb 8087 followed by shutdown (to fix DC errors) no longer works. Instead, ssh into fb1, and run sudo systemctl restart daqd_*.
  4. Timing error on c1sus machine was linked to the mx_stream processes somehow not being automatically started. The "!mxstream restart" button on the CDS overview MEDM screen should run the necessary commands to restart it. However, today, I had to manually run sudo systemctl start mx_stream on c1sus to fix this error. It is a mystery why the automatic startup of this process was disabled in the first place. Jamie has now rectified this problem, so keep an eye out.
  5. c1oaf persistently reported DC errors (0x2bad) that couldn't be addressed by running mxstream restart or restarting the daqd processes on FB1. Restarting the model itself (i.e. rtcds restart c1oaf) fixed this issue (though of course I took the risk of having to go into the lab and hard-reboot 3 machines).
  6. At some point, we thought we had all the CDS lights green - but at that point, the END FEs crashed, necessitating Koji->EX and Gautam->EY hard reboots. This is a new phenomenon. Note that the vertex machines were unaffected.
  7. At some point, all the DC lights on the CDS overview screen went white - at the same time, we couldn't ssh into FB1, although it was responding to ping. After ~2mins, the green lights came back and we were able to connect to FB1. Not sure what to make of this.
  8. While trying to run the dither alignment scripts for the Y-arm, we noticed some strange behaviour:
    • Even when there was no signal (looking at EPICS channels) at the input of the ASS servos, the output was fluctuating wildly by ~20cts-pp.
    • This is not simply an EPICS artefact, as we could see corresponding motion of the suspension on the CCD.
    • A possible clue is that when I run the "Start Dither" script from the MEDM screen, I get a bunch of error messages (see Attachment #2).
    • Similar error messages show up when running the LSC offset script for example. Seems like there are multiple ports open somehow on the same machine?
    • There are no indicator lights on the CDS overview screen suggesting where the problem lies.
    • Will continue investigating tomorrow.

Some other general remarks:

  1. ETMX watchdog remains shutdown.
  2. ITMY and BS oplevs have been hijacked for HeNe RIN / Oplev sensing noise measurement, and so are not enabled.
  3. Y arm trans QPD (Thorlabs) has large 60Hz harmonics. These can be mitigated by turning on a 60Hz comb filter, but we should check if this is some kind of ground loop. The feature is much less evident when looking at the TRANS signal on the QPD.

UPDATE 8:20pm:

Koji suggested trying to simply retsart the ASS model to see if that fixes the weird errors shown in Attachment #2. This did the trick. But we are now faced with more confusion - during the restart process, the various indicators on the CDS overview MEDM screen froze up, which is usually symptomatic of the machines being unresponsive and requiring a hard reboot. But we waited for a few minutes, and everything mysteriously came back. Over repeated observations and looking at the dmesg of the frontend, the problem seems to be connected with an unresponsive NFS connection. Jamie had noted sometime ago that the NFS seems unusually slow. How can we fix this problem? Is it feasible to have a dedicated machine that is not FB1 do the NFS serving for the FEs?

  13480   Fri Dec 15 01:53:37 2017 jamieUpdateCDSCDS recovery, NFS woes
Quote:

I would make a detailed post with how the problems were fixed, but unfortunately, most of what we did was not scientific/systematic/repeatable. Instead, I note here some general points (Jamie/Koji can addto /correct me):

  1. There is a "known" problem with unloading models on c1lsc. Sometimes, running rtcds stop <model> will kill the c1lsc frontend.
  2. Sometimes, when one machine on the dolphin network goes down, all 3 go down.
  3. The new FB/RCG means that some of the old commands now no longer work. Specifically, instead of telnet fb 8087 followed by shutdown (to fix DC errors) no longer works. Instead, ssh into fb1, and run sudo systemctl restart daqd_*.

This should still work, but the address has changed.  The daqd was split up into three separate binaries to get around the issue with the monolithic build that we could never figure out.  The address of the data concentrator (DC) (which is the thing that needs to be restarted) is now 8083.

Quote:

UPDATE 8:20pm:

Koji suggested trying to simply retsart the ASS model to see if that fixes the weird errors shown in Attachment #2. This did the trick. But we are now faced with more confusion - during the restart process, the various indicators on the CDS overview MEDM screen froze up, which is usually symptomatic of the machines being unresponsive and requiring a hard reboot. But we waited for a few minutes, and everything mysteriously came back. Over repeated observations and looking at the dmesg of the frontend, the problem seems to be connected with an unresponsive NFS connection. Jamie had noted sometime ago that the NFS seems unusually slow. How can we fix this problem? Is it feasible to have a dedicated machine that is not FB1 do the NFS serving for the FEs?

I don't think the problem is fb1.  The fb1 NFS is mostly only used during front end boot.  It's the rtcds mount that's the one that sees all the action, which is being served from chiara.

  13481   Fri Dec 15 11:19:11 2017 gautamUpdateCDSCDS recovery, NFS woes

Looking at the dmesg on c1iscex for example, at least part of the problem seems to be associated with FB1 (192.168.113.201, see Attachment #1). The "server" can be unresponsive for O(100) seconds, which is consistent with the duration for which we see the MEDM status lights go blank, and the EPICS records get frozen. Note that the error timestamped ~4000 was from last night, which means there have been at least 2 more instances of this kind of freeze-up overnight.

I don't know if this is symptomatic of some more widespread problem with the 40m networking infrastructure. In any case, all the CDS overview screen lights were green today morning, and MC autolocker seems to have worked fine overnight.

I have also updated the wiki page with the updated daqd restart commands.

Unrelated to this work - Koji fixed up the MC overview screen such that the MC autolocker button is now visible again. The problem seems to do with me migrating some of the c1ioo EPICS channels from the slow machine to the fast system, as a result of which the EPICS variable type changed from "ENUM" to something that was not "ENUM". In any case, the button exists now, and the MC autolocker blinky light is responsive to its state.

Quote:

I don't think the problem is fb1.  The fb1 NFS is mostly only used during front end boot.  It's the rtcds mount that's the one that sees all the action, which is being served from chiara.

 

  9449   Fri Dec 6 21:38:27 2013 KojiUpdateLSCCDS related activities for LSC

I worked on the CDS related stuffs for LSC yesterday and today.


1. Slow machines:

I checked the database files for c1iscaux and c1iscaux2 (slow machines). They are mainly
used for the control of LSC whitening filters. The channel names were totally random as we
reconfigured the RF PDs while the channel names had been unchanged.

- Now the database was modified so that the PD name and the channels are related.
- saverestore.req and autoBurt.req were also changed accordingly.

- PD interface channels are completely random. Don't use them.
- I found the whitening of DCPDs are not effective.

- We need to clean up /cvs/cds/caltech/target directory. The autoBurt requests in the old targets
are making unnecessary burt files.

2. LSC screens

- The channel names on the LSC OVERVIEW screen was modified. (Attachment 1)
- A new LSC Whitening screen was made. (Attachment 2)

3. LSC screen generator

To touch the main LSC screen is very tough. The screen was split in to several sub screens
and combined with a command.

/opt/rtcds/caltech/c1/medm/c1lsc/master/generateLSCscreen/generateLSCscreen.py

This command combines the multiple adl files into a single file with x&y offsets.
This way, you can work with the each section of the screen.
Also, moving the blocks are just easy.

4. LSC Code Bug?

During the screen making, I found that a couple of the whitening switches are not
working properly.
e.g. When AS165 (either I or Q) FM1 is activated throught the whitening trigger,
the MSB bit (bit15) of the binary I/O (C1:LSC-BIO_0_0) does not .

SImilarly ASDC FM1 does not toggle bit15 of C1:LSC-BIO_0_1.

The other channels seems OK.

At first, I thought this is a bug of "Bit2Word" block. But an individual test of the block showed that
the block is not guilty. So why is only Bit15 malfunctioning???

 

  3034   Wed Jun 2 11:25:16 2010 josephb,alexUpdateCDSCDS saga (aka the bad code saga)

Alex updated the awg.par file to handle all the testpoints.  Basically its very similar to the testpoint.par, but the prognum lines have to be 1 higher than the corresponding prognum in testpoint.par.  A entry looks like:

[C1-awg0]
hostname=192.168.1.2
prognum=0x31001002

After running "diag -i" and seeing some RPC number conflicts, we went into /cvs/cds/caltech/cds/target/gds/param/diag_C.conf and changed the line from

&chn * *  192.168.1.2 822087685 1

to

&chn * *  192.168.1.2 822087700 1

The number represents an RPC number.  This was conflicting with the RPC number associated with the awgtpman processes.  We then had to update the /etc/rpc file as well.  At the end we changed chnconf 822087685 to chnconf 822087700.  We then run /usr/sbin/xinetd reload

Lastly we edited the /etc/xinetd.d/chnconf file line

server_args             = /cvs/cds/caltech/target/gds/param/tpchn_C4.par /cvs/cds/caltech/target/gds/param/tpchn_C5.par

to

server_args             = /cvs/cds/caltech/target/gds/param/tpchn_C1.par /cvs/cds/caltech/target/gds/param/tpchn_C2.par /cvs/cds/caltech/target/gds/param/tpchn_C3.par /cvs/cds/caltech/target/gds/param/tpchn_C4.par /cvs/cds/caltech/target/gds/param/tpchn_C5.par /cvs/cds/caltech/target/gds/param/tpchn_C6.par /cvs/cds/caltech/target/gds/param/tpchn_C7.par /cvs/cds/caltech/target/gds/param/tpchn_C8.par /cvs/cds/caltech/target/gds/param/tpchn_C9.par

 

Alex also recompiled the frame builder code to be able to handle more than 7 front ends.  This involved tracking down a newer version of libtestpoint.so on c1iscex and moving it over to megatron, then going in and by hand adding the ability to have up to 10 front ends connected.

Alex has said he doesn't like this code and would like it to dynamically allocate properly for any number of servers rather than having a dumb hard coded limit.

Other changes he needs to make:

1) Get rid of set dcu_rate ## = 16384 type lines in the daqrc file.  That information is available from the /caltech/chans/C1LSC.ini type files which are automatically generated when you compile a model.  This means not having to go in by hand to update these in daqrc.

2) Get some awg.par and testpoint.par rules, so that these are automatically updates when you build a model.  Make it so it automatically assigns a prognum when read in rather than having to hard code them in by hand.

3)Slave the awgtpmans to a single clock running from the IO processor x00. This ensures they are all in sync.

 

 

 

  14149   Thu Aug 9 12:31:13 2018 gautamUpdateCDSCDS status update

The model seems to have run without issues overnight. Not completely related, but the MC1 shadow sensor signals also don't show any abnormal excursions to negative values in the last 48 hours. I'm thinking about re-connecting the satellite box (but preserving the breakout setup at 1X6 for a while longer) and re-locking the IMC. I'll also start c1ass on the c1lsc frontend. I would say that the other models on c1lsc (i.e. c1oaf, c1cal, c1daf) aren't really necessary for basic IFO operation.

Quote:

As part of this slow but systematic debugging, I am turning on the c1lsc model overnight to see if the model crashes return.

  14166   Wed Aug 15 21:27:47 2018 gautamUpdateCDSCDS status update

Starting c1cal now, let's see if the other c1lsc FE models are affected at all... Moreover, since MC1 seems to be well-behaved, I'm going to restore the nominal eurocrate configuration (sans extender board) tomorrow.

  14192   Tue Sep 4 10:14:11 2018 gautamUpdateCDSCDS status update

c1lsc crashed again. I've contacted Rolf/JHanks for help since I'm out of ideas on what can be done to fix this problem.

Quote:

Starting c1cal now, let's see if the other c1lsc FE models are affected at all... Moreover, since MC1 seems to be well-behaved, I'm going to restore the nominal eurocrate configuration (sans extender board) tomorrow.

  14193   Wed Sep 5 10:59:23 2018 wgautamUpdateCDSCDS status update

Rolf came by today morning. For now, we've restarted the FE machine and the expansion chassis (note that the correct order in which to do this is: turn off computer--->turn off expansion chassis--->turn on expansion chassis--->turn on computer). The debugging measures Rolf suggested are (i) to replace the old generation ADC card in the expansion chassis which has a red indicator light always on and (ii) to replace the PCIe fiber (2010 make) running from the c1lsc front-end machine in 1X6 to the expansion chassis in 1Y3, as the manufacturer has suggested that pre-2012 versions of the fiber are prone to failure. We will do these opportunistically and see if there is any improvement in the situation.

Another tip from Rolf: if the c1lsc FE is responsive but the models have crashed, then doing sudo reboot by ssh-ing into c1lsc should suffice* (i.e. it shouldn't take down the models on the other vertex FEs, although if the FE is unresponsive and you hard reboot it, this may still be a problem). I'll modify I've modified the c1lsc reboot script accordingly.

* Seems like this can still lead to the other vertex FEs crashing, so I'm leaving the reboot script as is (so all vertex machines are softly rebooted when c1lsc models crash).

Quote:

c1lsc crashed again. I've contacted Rolf/JHanks for help since I'm out of ideas on what can be done to fix this problem.

  9077   Wed Aug 28 00:41:23 2013 JenneUpdateCDSCDS svn commits not happening

svn status update. asx, als and ioo were found not committed. Not sure about who modified ioo last after Jenne.

//edit Manasa - edited the/ elog instead of replying //

  9079   Wed Aug 28 05:21:58 2013 manasaUpdateCDSCDS svn commits not happening

I am responsible for missed svn commits with als and asx. I have committed them.

But I have not modified anything with ioo in the last few weeks.

 

  13166   Fri Aug 4 09:07:28 2017 ranaUpdateCDSCDS system essentially NOT fully recovered

Tried getting trends with dataviewer just now since Jamie re-enabled the minute_raw frame writing yesterday. Unable to get trends still:

Connecting to NDS Server fb1 (TCP port 8088)
Connecting.... done
Server error 18: trend data is not available
datasrv: DataWriteTrend failed in daq_send().
unknown error returned from daq_send()T0=17-08-04-08-02-22; Length=28800 (s)
No data output.

  13153   Mon Jul 31 18:44:40 2017 JamieUpdateCDSCDS system essentially fully recovered

The CDS system is mostly fully recovered at this point.  The mx_streams are all flowing from all front ends, and from all models, and the daqd processes are receiving them and writing the data to frames:

Remaining unresolved issues:

  • IFO needs to be fully locked to make sure ALL components of all models are working.
  • The remaining red status lights are from the "FB NET" diagnostics, which are reflecting a missing status bit from the front end processes due to the fact that they were compiled with an earlier RCG version (3.0.3) than the mx_streams were (3.3+/trunk).  There will be a new release of the RTS soon, at which point we'll compile everything from the same version, which should get us all green again.
  • The entire system has been fully modernized, to the target CDS reference OS (Debian jessie) and more recent RCG versions.  The management of the various RTS components, both on the front ends and on fb, have as much as possible been updated to use the modern management tools (e.g. systemd, udev, etc.).  These changes need to be documented.  In particular...
  • The fb daqd process has been split into three separate components, a configuration that mirrors what is done at the sites and appears to be more stable: The "target" directory for all of these components is now:
    • daqd_dc: data concentrator (receives data from front ends)
    • daqd_fw: receives frames from dc and writes out full frames and second/minute trends
    • daqd_rcv: NDS1 server (raises test points and receives archive data from frames from 'nds' process)
    The "target" directory for all of these new components is:
    • /opt/rtcds/caltech/c1/target/daqd
    All of these processes are now managed under systemd supervision on fb, meaning the daqd restart procedure has changed.  This needs to be simplified and clarified.
  • Second trend frames are being written, but for some reason they're not accessible over NDS.
  • Have not had a chance to verify minute trend and raw minute trend writing yet.  Needs to be confirmed.
  • Get wiper script working on new fb.
  • Front end RTS kernel will occaissionally crash when the RTS modules are unloaded.  Keith Thorne apparently has a kernel version with a different set of patches from Gerrit Kuhn that does not have this problem.  Keith's kernel needs to be packaged and installed in the front end diskless root.
  • The models accessing the dolphin shared memory will ALL crash when one of the front end hosts on the dolphin network goes away.  This results in a boot fest of all the dolphin-enabled hosts.  Need to figure out what's going on there.
  • The RCG settings snapshotting has changed significantly in later RCG versions.  We need to make sure that all burt backup type stuff is still working correctly.
  • Restoration of /frames from old fb SCSI RAID?
  • Backup of entirety of fb1, including fb1 root (/) and front end diskless root (/diskless)
  • Full documentation of rebuild procedure from Jamie's notes.
  11687   Tue Oct 13 17:04:54 2015 ericqUpdateCDSCDS things

After some discussion at last week's 40m meeting, I increased the frequency of daqd trying to write out minute trends from hourly to every two hours.

This has eliminated the hourly crashes. daqd still crashes sometimes, but only a few times per day. yes

However, looking at the oplev summary pages that actually use the minute trends, it looks like they're only sporadically getting succesfully written out. no


Also, I was having a lot of problems with the frontends' EPICS processes dying when I would try to update the SDF table. I rebuilt all of the frontends with RCG 2.9.6, which differs from the 2.9.4 that we had been running by SDF bugfixes and an RMS calculation bugfix. The SDF procedures are much more stable now. 

I have not yet discovered anything broken by this chage, and the tests I made for the last upgrade were all fine; last weeks tiny DRFPMI lock was achieved after this change. 

ELOG V3.1.3-