Public Lab Research note


Advanced Searching and Sorting Tool for publiclab.org

by Ujitha | March 12, 2016 12:24 12 Mar 12:24 | #12837 | #12837

Name: T. U. R. Perera

Email: ujithaperera1@gmail.com

IRC nick: Ujitha

LinkedIn profile: https://lk.linkedin.com/in/ujithaperera

Location: Colombo, Sri Lanka (GMT +5:30)

Education: Bsc(Hons) in Information Technology, Faculty of Information Technology, University of Moratuwa.

Phone: +94710573293

Project : PublicLab.org - Search and sorting

Project title - Advanced Searching and Sorting Tool

Project description

Abstract/summary (<20 words):

Developing a service (set of classes within the project) for searching and sorting to output advanced nested results by prioritising tags and power tags.

Describe the need your project fulfils:

As describe in (link for issue), there is major issue to search among profile contents. Therefore search must expand to cover all the possible search result outcomes in the system. And also when data in the database getting larger, then occurs the requirement of a efficient sorting mechanism. with difficulties of having sorted, well formatted information from the system developing other tools and features should not be prioritise. Because public lab is all about knowledge. knowledge should deliver fast and effective manner. therefore all the tags should be sorted according to any pattern that user request. Such as alphabetically, date wise etc. And with effectiveness of the sorting system, it’s big support for users and also for admin board to sorting based activities. such as removing inappropriate contents, spam removals, modify autocompletions.

How will your project meet this need:

According to my study, I saw that searching logic is written in the “app/controllers/search_controller.rb”. We can develop the next features of the search in this same controller but I would like to suggest to move this code to several classes as a service for the implementing and development ease. Then we can reference those searching and sorting methods from the search controller or any other controller (for future development requirement)

before the developments of searching methods, have to have sufficient study about possible data retirements from the system and how often these queries get triggered. With this information we can develop good mechanism(s) by starting zero using Active Record query interface or we can use supportive gems like ransack, squeel, sunspot. But we have to check for conflicts that generate with the current plots2 system before proceed.

After the search result retirement we can restrict these data and order them according to user required parameters. This also can be developed with support of the using gems like mongoid. we can add all the customizations to suite “Search and sorting” idea requirement.

Timeline/milestones:

Screen_Shot_2016-03-12_at_3.34.33_PM.png

What broader goal is your project working towards?

with the increment of users and research notes searching and sorting process might get slow. Therefore I planned to make open variables to connect another controller or class. In future we can implement new technologies without much modification to existing code base.

And also with the range of data we have, there may be a requirement to drop relational databases and go for nosql or objects oriented databases. at this scenario also our searching and sorting should function properly by using suitable adaptors to our system.

What resources will you need :

Definitely I need more dummy or live data of the system than seeds file provides.

Documentations of technologies that I’m going to use and research papers about data searching and sorting algorithms.

Need guidance from PublicLab director board and from my mentor.

Setup:

Yes I have forked the publiclab/plots2 repository and I have a running version in my local. And I pulled all the updates with remote repository.

cloud9 development environment

Forked repository

Experience :

Before I attend to university I practised java technologies and mySQL as a database server. In Java I have very good knowledge about java SE developments and Java EE developments. I learnt fundamental web building blocks from java web applications and spring framework. In my university level I move to ruby and Rails as a web development framework. I have started web development since 4 years. I have come a long way since. I have experience in technologies such as Ruby, Java, JavaScript, HTML, CSS, GIT and frameworks/libraries such as Java SDK, Android SDK, Spring MVC, Hibernate, Ruby on Rails, jQuery, jQuery UI, Backbone.js etc. I also managed to work part time company called Vesess which is specialized in web application development, during the 2nd year of my undergraduate course. I mostly worked with technologies like Ruby, RoR, MySQL, GIT, Javascript etc.

In this time period I developed HappyInvoicing by using ruby on rails and developed few modules for the project called vgo which is the main backend app Sri Lankan taxi service called "vgo" in Vesess.
And now I am an Intern at Vesess and contributing project Hiveage by improving its test cases Yes I went through the contributor guidelines and I have good knowledge about Git and I am very much familiar with both GitHub and GitLab.

Teamwork :

As a university student teamwork is not a strange thing to me. In all the academic projects we work as a team. In my first year we developed an automated gas leakage detector with another five students and in my second year we developed a software project to store small java apps and sell them to users. In both of these projects, they were one year projects and having five members in the team. So in experience I know that there are barriers, failures and conflicts when working in with a team, and also with my experience I know how to manage difficulties that we have to face and how to communicate with team mates with motivative manner. As I mention above I'm glad to say that with this experiences I worked with professional software developing team in the industry and I was able to contribute successfully to those SaaS projects developed using Ruby on Rails with my teamwork skills.

Expertise :

I have experience in nested sql queries with rails and worked with postgreSQL also. since rails is independent from database server I can use my knowledge to write searching and sorting algorithms for the current requirement

Audience :

My searching and sorting service serve to all the users in the publiclab.org and to all the admins in the system. My solution provides very user friendly searching experience to the user and tag sorting can be done by very simple clicks. therefore any kind of user can take the advantage of this service. no any technical background needed. And through the API any authorised user can search data from the publiclab.org knowledge base.

Context :

Free and open source means that you are free to explore it, develop it, adjust it or distribute it, unlike the proprietary software. It’s a great concept to share knowledge and resources among people without commercial or any other restrictions. Long story short, open source is my passion now. I’m more interested in web applications than desktop applications. When I was developing web applications I noticed that almost every library/framework that I used is open source. That’s how I became interested in open source community

Ongoing involvement:

These are my previous contribution to the project

https://github.com/publiclab/plots2/issues/383

https://github.com/publiclab/plots2/issues/407

And I was able to find the mysql12 version issue and I notified it to the developer group through plots2-dev group. And was able to fix this issue with mentor Jeffery Warren.

Its pleasure for me to work team like Publiclab. I like to join Publiclab journey and contribute to the project plots2 specially from backend developments. After the GSoC event I research more about searching and sorting and hope to improve the system time to time with the help of the community. It's very useful for my academics in the university and great experience for my life.

Commitment:

Yes I Understood that Importance of this project and I can spend required time period to complete particular task without being time oriented. and since my mentors and me are in completely different time zones I can keep my communication with my mentor organization without any interruptions and delays.


15 Comments

Hi, Ujitha - good overview here. I think Sunspot may have more promise than Elasticsearch -- I wasn't able to find Elasticsearch licensing information -- is it open source? That it's not hosted on a git repository seems a bit odd to me. I'm not sure we'd be interested in using a hosted commercial product instead of a standard module for search in Rails.

With Sunspot, we'd have to set up a Sunspot server, but that sounds reasonable. We could coordinate with @icarito, our sysadmin, once you got a test version running.

I like the idea of systemetizing and standardizing our search. We could also use a search function for tags in more places, such as to suggest tags from the tag form. And in the new Dashboard, we want a wiki-specific search in the right-hand column, and the Q&A project will need a good full-text search for when people type in new questions. So lots of good reasons to have an optimized and re-plumbed search.

@liz, any thoughts on prioritization of this vs. other projects? I know the WWG priorities list shows both the Q&A feature and the Rich Profiles features, but not this Search and Sorting system. Depending on how many students apply, how would you prefer that we prioritize this?

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Actually, Ransack looks pretty nice in that it has no external dependencies -- do I understand that correctly? Even the very primitive search system we have now doesn't have too much trouble with the relatively small amount of data in the site, so I wonder if Sunspot is overkill and we could use Ransack.

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


@jeff, Yes I agree with you. Since elasticsearch is a bit heavy system than Sunspot and Ransack, we can find our solution from sunspot or ransack. From the both of sunspot and ransack I also think ransack is good for us.

Reply to this comment...


Hey Ujitha! I believe Sunspot will be better. I'm familiar with Sunspot gem too. Do consider using Sunspot for your proposal, it'll be easier for us to coordinate as a search functionality is vital for my project.

Thanks!

Reply to this comment...


Hi Jitesh - since Ransack does not have external dependencies, i believe, it could be easier to deploy for us. But we're open to Sunspot -- what are the reasons you believe it will be better? Thanks!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


No reason as such, but I have previously used Sunspot with acts_as_taggable for a University project and the setup was pretty straightforward.

Reply to this comment...


Yes Sunspot is running on top of solr search engine. And we have to install solr gem. But when we discuss with compared to ransack, very basic issue with ransack is, it recommends ruby version 2.2+ according to their documentation. Sunspot support for all the ruby versions and rails 3+ versions.

Since Publiclab is text based (research notes and wiki pages) system and be supportive to Q&A tool which is going to implement in near future, we can use advanced text searching facilities provide by sunspot. And by using one text field (like the search box in publiclab.org) we can query easily from sunspot. For the nested (or recursive) searchings can be fulfilled using its Faceting feature.

From Ransack_demo we can have good idea about how ransack works. It has quick implementation for the sorting. But I think its segmented searching fields and detailed categorization may not be suitable with PublicLab. As Jeff mention ransack has no external dependancy requirement. But as I guess after the installing phase, we can have more advantage from sunspot in our publiclab scenario.

Reply to this comment...


By the way thank you very much @Jitesh for sharing your idea in my proposal.

Reply to this comment...


That's a great argument for sunspot. Does ransack not support full text search?

Thanks!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


As far as I know, content based full text search with weighting facility is not available in ransack. when we prioritize tag field I think these features may be very useful. Specially for admins

Reply to this comment...


Hi Jeff, Above I expressed my ideas and findings about these gems. I'm completely ok to work using both ransack or sanspot gem. Since you have the overall idea about the plots2 project, I would like to have your preference from the suggestions. It will be more helpful for me for my final GSoC proposal. Thanks !

Reply to this comment...


Hi, Ujitha - thanks, I just wanted to hear specifics on the different options. It sounds like Sunspot is a great option. A couple additional question are:

  • how will testing work with an external service?
  • will this complicate initial install for new users, or do we assume they won't need to mess with the search service? Is there any simpler fallback that could be used? Just wondering, not essential

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hi jeff,

Thanks for your ideas and questions, it guides me to study deeper about these gems. For the test cases we have to write test cases for the codes that are written by us. And for the external service, there is built in gem in sunspot called sunspot-rails-tester.

But, yes for the new users and and for the production environment we have doubts. Then I realise the risk thst we have to face. So I search for other gems with no external dependancies. Then I again stop at ransack gem. It uses meta search, so no external dependancy. Then no production or testing issues. we can write our code ourselves with support of the gem top of active records. And its better for sorting than sanspot.

what do think about this suggestion ?

Thanks.

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


I think it's a tough call. I see benefits to both -- how consistent can we make the interface between Ransack and Sunspot? Could we standardize it enough that either could be used? That is, not do double the work, but architect it so that although we work in Ransack, a swap to Sunspot would not be too hard?

Let's not get bogged down if that's too much to think about at the moment. But it's one option that we could keep in mind.

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


I was thinking today that in searching, it's often very helpful to be able to see:

  • the author (if it's a note)
  • the # of stars a result has
  • maybe how recently it was updated or created?

in order to know better if it's a good match for what you're searching for. Maybe this can be factored into the plan. Thanks!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Login to comment.