Public Lab Research note


Finding closest match spectra from the database (GSoC) - Final Post!

by Sreyanth | September 14, 2013 21:03 14 Sep 21:03 | #9330 | #9330

Continuing with my previous research notes [1, 2], which discuss about the whereabouts of my project, this note describes the project implementation highlighting the various features that are provided:

  1. Finding closest match spectra from the database (GSoC) - work done so far

  2. Find closest match spectra from database - GSoC project

I would like to let you know that I have successfully completed the project and integrated it into the Spectral Workbench and am eagerly waiting for an official announcement!

What I wanted to do

A Scalable Spectral Matching Mechanism. With this in place, the users will be able to see some results which show up when the system finds some similar spectra in the database. This helps the user explore and learn more about his uploaded spectrum.

I now introduce my implementation of such a system which I wrote for the Spectral Workbench, with the help of my mentor, Jeffrey Warren, supported by Google's Summer of Code 2013!

How it works?

You will now see a 'Find Similar' button on all the analyze pages (something like spectralworkbench.org/analyze/spectrum/spectrum_id) of the spectra.

new_on_analyze_page.png

On clicking this, you will be taken to the matching interface (which will be: spectralworkbench.org/match/search/spectrum_id). This interface will be used to interact with the database for finding closest matches.

Match_interface.png

As you will see, there will be some results already, and the graph on the page shows all those matches. You can click on the remove link beside any spectrum on the graph to clear it from the graph. Also, you have a 'Clear plot' button to clear the entire graph, so that you can compare the results as you wish. For that purpose, you can click on the good old 'Compare' button which will be preloaded with the results. If you want to compare with some spectrum which is not listed as a match, you can search for it using the search option and compare it!

Now, we have something called 'fit'. This determines how close the returned matches are to the main spectrum in the question. The lower the fit, the closer is the match.

idea.png

In the above image, the width of the search band is what we are referring to by 'fit'. For more details about the method, please refer to my note here.

Now, lets see how the 'fit' parameter changes the results.

Consider the matching page for spectrum: 431 as shown below.

431_90.png

This page is showing that fit = 90. This has been automatically selected by the system to display a good number of matches. You can always change this and see the results change in the graph. As simple as that. So lets go ahead and see how this changes the results.

I obtain this when I change the fit parameter to 85:

431_85.png

See that some matches which were in the previous image aren't seen here! Now lets change the fit to 100 and see how it works.

431_100.png

Simple. Isn't it? If you would like to, you can even go ahead an click on "Save as set" button and it will be saved as a new set of spectrums!

What can this do? And what it can't?

As described in my previous note [2] it searches for the matches in the close vicinity of the spectrum, both above and below it. But to account for the x-shift problem, where the spectrum may be shifted in the +ve or -ve X-direction, i.e., either to left or right of the expected position (this may be mostly due to differences capturing conditions), I have averaged the relative intensity values for every overlapping bins so that the curve gets smoother.

Due to the averaging, and higher fit parameter there will be some false positives reported. Like this one: Finding Neon by Chris Fastie. We still need to use some filtering techniques for the matches (something like peak matching/counting etc.,). But without deciding what to do for it, we want to collect details about some other issues with the system and we plan to solve them appropriately.

Also, as straylight pointed out (commented on this post), the matching algorithm only works for the calibrated spectra and only searches for the closest matches among those calibrated ones.

Anything else?

Yes. I take this section to introduce you to another special feature -- Live Matching (which is still in prototype level with much accuracy on the way).

From now, you will see a "Start Live Matching!" button on the capture page. Like this one

live_match_button.png

After clicking on it, you will be shown the closest matches the system is able to for the spectra you are just about to capture. This opens, in my opinion, way to various interesting experiments!

If there are some matches, then something like this will be seen:

live_match_found.png

If there are no matches, then you will see:

live_no_matches.png

Sorry if you keep on seeing "No matches found" message. This feature will be improved in the days to come. If you feel that this feature is distracting you from your work, you can click on 'Stop' link displayed with a message like "Refreshing in 5 secs (Stop)" and everything will be as it was before!

What has changed from the previous note?

In the previous note, which can be found here, I introduced a working prototype of the system and we received a great response and suggestions. I followed those and implemented them accordingly -- including narrowing down the bin size, using all the data points available, making use of overlapping bins etc.

Final words

Yep! The project has come to an end. All's well that ends well.

I had no idea of what I was about to face when I took up this project. I realized sooner that this is no Image Processing problem (see my note here) and was lucky enough to come up with an approach to do this mathematically. Now, I know what a spectrum is. I know what calibration means. And most importantly I contributed a small feature to the scientists, teachers, amateur physicists, students out there who are interested in spectral analysis.

I was and am always excited about this project, but many times was discouraged by my experiments which were failing horribly. Thanks to my mentor, Jeffrey Warren who always motivated me to do more and in a correct way. Thanks Jeff.

A community is what makes opensource so special! And I am very lucky to be backed by some of the most interesting and innovative people. I enjoyed taking suggestions from them.

Thanks Dave, who made things very tough for me in the initial days (as I was unable to understand when he said something like bin, over exposure), but now, I enjoy exchanging emails with him on a regular basis about my project! His contributions to the project are invaluable.

Special thanks to Chris Fastie and Nathan McCorkle for suggesting features and helping me find out various bugs. Thanks to Bob, who offered a helping hand along with Jeff and Dave during my pre-GSoC period to help me structure my proposal. Also special thanks to the Earthquake Bolt Barnstar, Liz Barry, who was quite active and for mentioning "Sreyanth - Developing killer features for spectral workbench" in a presentation about Public Lab. Thanks to the Dev Manager, Becki Chall who acted promptly by forwarding various details about GSoC deadlines and updates as a GSoC Administrator for Public Lab. Thank you guys for making my first ever GSoC a wonderful experience!

Also, my sincere apologies for some recent bugs which unintentionally popped up (that have been reported recently) due to my code edits. Sorry for the inconvenience caused.

And, last, but not the least, thanks to you who patiently read this lengthy note! Should you have any issues, queries or suggestions, please feel free to contact me at sreyanth@gmail.com.

Thanks everyone!

Sreyanth


3 Comments

Sreyanth, awesome work, I love the interface and well done on following the project through to completion. It adds a lot to the functionality of spectral workbench and will; be a valuable tool.

Like everything new, I took the opportunity to play with the interface. It picks up stuff very nicely. I used cfasties neon collection and sure enough, it matched the other spectra in the set, plus many more. Very nicely done. I like the "tolerance" feature as well.

I'm thinking the "find similar" relies on the user having calibrated the spectra ? so neon spectra that haven't been calibrated are not recognised ? This seems pretty obvious and it would be difficult to match spectra otherwise, but not impossible. How difficult do you think it would be to match spectra that have no wavelength information, purely using the appearance of peaks and the spatial relationship between the peaks ? Maybe start with some well known spectra, Hg, Ne, H, He and so on, and without calibration give the user an option to find possible candidates.

A long while ago and suggested it would be really nice to have a plot of the "standard" type spectra, that could be overlaid (or placed side by side) with a user's spectra. This would enable a quick visual comparison to be done. Such a spectra "database" would be scaled to the users own calibration, so they try to find, say copper, in a sample and can see what they should be looking for. This is essentially what you've done without the "official" correct spectra. We could always rely on chris (cfastie) to donate lots of spectra, he also does some awesome work.

Anyway, I'm mega impressed. I wonder what Balmer would think of this kind of technology and the ability to match spectra from around the world using a computer algorithm. Mind boggling really.

well done sreyanth.

stu

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Thanks for your compliment :-) Glad you liked the new features.

We came up with the present system assuming that the spectrums would be calibrated, and thus at present this will work for (and on) only the calibrated ones. But before going to match the uncalibrated ones, we need to study more about the issues about the current one in place. Matching the spectrums purely by using the appearance of peaks wont be much difficult I suppose, but that requires some calculations, remodeling the current system etc. . Let us extend the system to cover the uncalibrated ones in the future enhancements for this feature.

Thanks for pointing this out. I forgot adding this in the note under the section: "What can this do? And what it can't? ". Updating it now.

Thanks again!

Sreyanth

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Sreyanth--Thanks so much for all your hard work this summer. I think the end result is absolutely amazing! I'm glad to hear you got so much out of your first GSoC experience--so did we here at Public Lab. Please keep in touch--you're contributions are wonderful! -Becki

Reply to this comment...


Login to comment.