# Baseline noise removal test in Spectral Workbench 2

by warren | 05 Feb 16:20

### What I want to do

I want to test out some of the new tools in Spectral Workbench 2 and see if we can do some useful things like reducing noise in spectra.

### My attempt and results

Some spectra have high baseline noise -- whether due to the webcam itself, or due to stray light in the spectrometer.

I used a Transform operation with the expression Math.max(A-0.15,0)+(0.15*Math.max(A-0.15,0)) to subtract out data that falls below 15% -- and the way I did it also spreads the remaining data evenly from 0-100%. You can see the difference before and after the noise reduction here:

Basically, this:

• takes the average and subtracts 15%: A-0.15
• cuts off anything that falls below zero: Math.max(A-0.15,0)
• adds back a proportion of 15% based on if the original value was >15%: +(0.15*Math.max(A-0.15,0))

The last part is not perfect; if the original value is 100%, it only adds back a maximum of 85% of 15%, which drops the final value by 2.25% versus the original. This is shown at the highest peaks of the comparison graph, but the effect is most prominent only at the very highest peaks. I could remove this with a more complex expression, but it seems not worth it to me.

Try it out yourself by forking these two and tweaking the operation yourself: https://spectralworkbench.org/sets/show2/3163

Note: If this looks good, I can package it up as a "noise reduction" tool in its own right -- the Transform operation may enable us to quickly create lots of additional tools using basic math. If you think adding such a "noise reduction" tool is a good idea, speak up!

I'm confused by the two plots. The 'noisy' plot doesn't have the signal so I'm guessing it's an average of what is left after chopping off anything above 15% of max? Chopping out data inherantly produces artifacts (errors). I'm also not understanding what the "add back 15%" is doing -- just numerically adding the offset to the peaks back after 15% was subtracted? I suspect this is not error free.

I'm also not convinced of this as "noise reduction"; it's sort of a "baseline subtraction" technique but not noise reduction. It's more similar to the visual cut-off of the "grass" (noise) of an analog RF spectrum analyzer. It looks nicer but doesn't improve the SNR.

To do noise reduction (which I do promote as a viable technique), you need to actually increase the SNR without eliminating either small, sharp peaks or low level broad "bumps" in the spectrum as both of those have real information which should be preserved.

One noise reduction technique I suggested in my HDR notes ( https://publiclab.org/notes/stoft/5-25-2013/hdr-search-high-er-dynamic-range and https://publiclab.org/notes/stoft/03-09-2014/hdr2-using-over-exposure-to-your-advantage ) is to grab, and average, a few lines of camera data just outside of the spectral RGB band but within the same camera image area. That represents the camera's "dark noise" (well, dark noise + light leakage and quantization noise from an 8-bit ADC). Subtracting that data does remove things like errant refracted light that is not part of the diffraction process. The data is averaged because you will simply subtract this background "DC" noise offset from the real signal you get from using 3 parallel lines of in-band data.

Next, you could automatically take a set of 3-5 "passes of data" for a given spectra and average those because with Gaussian noise (most of the webcam noise) the noise will decorrelate and leave only the correlated signals -- although broad, low SNR "humps" may still suffer a bit. Note, 3 "passes" of data means 3-5 separate sets of frame data, even if they are successive frames.

For residual noise within the spectrum, you need to 1) average 3 parallel lines of spectral data from within a single frame (which decorrelates the in-spectrum camera noise) and 2) average (LPF) the final spectrum where the LPF cutoff is no lower than about 1-2nm so as to preserve the real data resolution of 2-3nm for the best of PLab systems.

These can all be directly implemented with simple math from the frame data you can already collect. If some are fast enough, they could simply be automatic.

Is this a question? Click here to post it to the Questions page.

Hi, Dave - try reloading the page - i briefly edited the spectrum while chasing a bug, so you may have seen it in the wrong state -- tell me what you think.

You're right that we don't want to mix this up with boosting signal to noise -- we're just cutting out baseline noise. I'm changing the title slightly to be consistent with the set title -- 'removing baseline noise' -- not just removing noise.

As to time-averaging, I think it'd be pretty easy to implement, but would ONLY work if a spectrum is really vertically aligned -- so no uploaded spectra which are not from the live capture "waterfall" graph. I'm not sure how to make that extra clear so folks don't just time average and smear their data.

Fetching multiple pixel rows is possible with the spectrum.imgToJSON() method in the new v2 API:

https://github.com/publiclab/spectral-workbench/blob/master/app/assets/javascripts/spectralworkbench/SpectralWorkbench.Spectrum.js#L421

Ah yes, the new plots are there. However, the entire shape of the spectra has been shifted so the ratios between peaks are thus different from the original. You have therefore altered the real data -- which is, technically, now invalid. If you want to simply cut-off the base of the spectra so you simply don't see the noise (and other smaller signals) then that is all you can do. This typically means that near the base of the peaks, they suddenly just "drop off" to zero -- but that, at least, is an honest indication of what just happened -- and not an illusion that there was some form of noise reduction going one. Rule one is never "lie" (grin) to the user.

In theory, if a user has a stable system (everyone has learned that by now, right?) then uploading 3 successive spectra could work -- albeit maybe with less control over the process so the user might have to be warned that their results are dependent on having stable data ..... but warnings are easy and nearly free.

I consistently use averaging 3 parallel rows in matlab for my plots as it does produce more stable spectra.

As a side note, I have also noted that if I "search" around the spectral band and observe the resulting spectra, I can get potentially significant variation in the spectra despite the added stability of my V3 proto. I suspect this is related to the optical limitations of the design but I cannot yet either prove it or find a correlation -- but I believe it is real. So, in the mean time, I think at least averaging 3 parallel lines of pixels is actually necessary.

Is this a question? Click here to post it to the Questions page.

Well, i'm not saying this is an important step in any particular analysis -- just that you can do operations like this easily with the transform tool. You can use a different expression to do a simple cutoff -- and if you can come up with one, that'd be great to share.

I can think of one way to do it differently -- to run Math.ceil(A-0.15)- and multiply that by the original value. That way anything above 15% is multiplied by 1 (so it stays the same) and anything below is dropped to zero. Math.ceil(A-0.15)*A, which would look like this:

My point is that we can try lots of variations easily using the new Operations system.