I didn't have a separate training set and data set, so I think that's why the graphs came out looking too good. The units on the graphs are also incorrect, I was interpreting PSD as ASD. I haven't been able to get my Wiener filtering code working well I get unreasonable subtractions like the noise being larger than the unfiltered signal, so Eric showed me this frequencydependent calculation described here: https://dcc.ligo.org/LIGOP990002
This seems to be working well so far:
freq1.pdf
freq2.pdf
freq3.pdf
Here's all the plots on one figure:
frequency_dependent.pdf
Let me know if this looks believable.
Quote: 
Seems to good to be true. Maybe you're over fitting? Please put all the traces on one plot and let us know how you do the parameter setting. You should use half the data for training the filter and the second half for doing the subtraction.

