So the IPC situation on the front end network is not so great right now. For various no-longer-valid reasons, c1lsc had no RFM card, all the IPC connections were routed through the c1rfm model on c1sus, and routed to c1lsc via dolphin PCIe as needed. As things grew, c1rfm became overloaded. Koji tried to fix the situation by breaking things out of c1rfm to make direct connections where we could. This cleared up c1rfm a bit, but not c1mcs is overloading.
Reminder: PCIe (dolphin) is faster and higher bandwidth than RFM. The more things we can put on PCIe the better.
Attached is a graph of my rough accounting of the intended direct IPC connections between the front ends. By "intended direct" I mean what should be direct connections if we had all the appropriate hardware. Right now the actual connection graph is more convoluted than this since things are passing through c1rfm. I note this graph was NOT particularly easy to make, which is very unfortunate. I had to manually look through every model and determine the ultimate source of every incoming IPC. Kind of a pain in the butt. It would be nice if there was a simple way to represent this.
Here are some various solutions to the problem as I see it:
a) put c1lsc on the RFM network
This would allow c1lsc to talk to c1ioo, c1iscex, and c1iscey without having to go through c1sus, thereby eliminating c1rfm altogether. I'm not sure why we didn't just do this originally.
Requires:
b) put c1ioo on the PCIe network (and move c1sus's RFM card to c1lsc)
This is probably the most robust solution.
b1) There are roughly 8 IPCs going from c1ioo to c1sus, and 4 going the other way, and 3 IPCs from c1ioo to c1lsc. If we put c1ioo on PCIe all of these now RFM connections would become direct PCIe connections, which would be a big win.
At this point only the end station front ends would be on RFM, and most of the connections to those come from c1lsc, so it would make sense to give c1lsc the RFM card, thereby eliminating a lot of stuff from c1rfm.
Requires:
- dolphin card for c1ioo (do the old sun machines support these? if they don't we could swap the old sun machine with a new spare aLIGO-approved supermicro machines, which we have spares of)
- dolphin fibre to go to dolphin switch in 1X3 rack
b2) OR, we could move c1ioo to 1X4 with c1lsc and c1sus, and get a OneStop fibre cable to connect to its IO chassis. We would still need a dolphin card, but we could use coper instead of fibre. This is my preferred solution, since it moves c1ioo out of 1X1, where it's really in the way and making a lot of noise. It would also be easier to manage all the machines if they're together in one rack.
Requires:
- dolphin card for c1ioo
- dolphin coper cable for c1ioo
- OneStop fibre for c1ioo
c) put another cpu in c1sus
c1sus is (I believe) able to support another 6-core cpu. If we added more cores to c1sus, we could break up c1rfm into c1rfm0, c1rfm1, etc. This is a less elegant solution imho, but it would probably do the job.
Requires:
|