Public Lab Research note


GSoC 2021 Proposal: Legacy Code Deprecation

by gauravsingh2699 | April 05, 2021 08:10 05 Apr 08:10 | #26138 | #26138

GSoC 2021: Legacy code deprecation

About me

Name: Gaurav Singh

Email: gauravsingh2699@gmail.com, gauravsingh.191it218@nitk.edu.in

Github: https://github.com/gaurav2699

LinkedIn: https://www.linkedin.com/in/gaurav-singh-848600171/

Gitter: gaurav2699

Affiliation: National Institute of Technology, Karnataka

Study Year: 2nd year

Field of study: Information Technology

Location: New Delhi, India

Timezone: Indian Standard Time (UTC +05:30)

Project description

Deprecate old database models and code segments and also correct the tests and restructure the database. Along with that, produce archival export of the data that is deleted and merge deprecated content into standard ones.

Abstract/summary (<20 words):

There are several old database models and their references. They need to be deprecated and merged into different tables. Along with that, the associated tests and other code related to them should be removed.

Problem

Public Lab used to be a Drupal Site before it was shifted to be a Ruby on Rails website. While most of the legacy Drupal stuff are deprecated, there are still several models with drupal names. Along with that, there are several models and content types that are just unnecessary and cumbersome and should be removed for cleaner code and faster servers. Hence, this project would involve deprecating all the legacy code and models. Also, it will involve merging those deprecated content type into standard content types.

Implementation Plan

The Deprecation part can be divided into six major parts:

  1. Deprecate the Answer model and merge it into the comments table.
  2. Deprecate the drupal_content_type_map model and merge all nodes having map types to notes content type nodes.
  3. Deprecate the DrupalFiles and DrupalUpload Table and append the file path to the node body.
  4. Deprecate Tag Aliasing. Add a parent field of a tag(if exists) to the list of subtags and delete all references of parents.
  5. Deprecate DrupalNodeMainImages and create new native image records from the path of the DrupalNodeMainImages.
  6. Deprecate DrupalContentFieldImageGallery and insert the images at the start of the node revision body.

There will be tests that will fail after deleting each of the models. Those tests will need to be rewritten or restructured.

Let's talk about each part in detail:

A. Answer to Comments

The Answer model and Comments model both are very easy to confuse, as they have a lot of duplicated code. Along with that, they require double maintenance. Hence there is a need to deprecate the answer model and merge it into the comments model. The basic approach of this would be to write rails migration to migrate all the records from answers to comments. Then delete all reference of answers and change the references accordingly.

To generate a migration we can use the command

rails g migration MoveColumnDataToCommentsTable

And in the migration file

Since most of the migration work for the answers table to comments table has already been done, this step which involves the migration might not be necessary.

Comments table before:

Comments table after migrating the only record in answer table:

Now that the migration part is over, we will have to delete the answer model.

The command for generating the migration is:

rails destroy model Answers
rails g migration DropTableAnswers

Now the final step will be to remove and modify(if necessary) all references like tests of the answers model from the following files:

https://github.com/publiclab/plots2/blob/93415927cfbbf10bf42bff24eb542395e00f7103/test/functional/questions_controller_test.rb#L69

https://github.com/publiclab/plots2/blob/6faac02ddadd6a2d4ee03a60dee498969124604a/test/unit/comment_test.rb#L157-L176

https://github.com/publiclab/plots2/blob/6faac02ddadd6a2d4ee03a60dee498969124604a/app/controllers/stats_controller.rb#L66-L72

B. DrupalContentTypeMap to Nodes

There are several nodes with node type='map' which connects to the model DrupalContentTypeMap. We can instead migrate all the 'map' type nodes to the 'notes' type nodes. All the information from the DrupalContentTypeMap model can be merged into the node revision body.

This can be done with the following migration script which is generated similarly as above:

Node table before the migration:

Node Table after the migration:

Then we can deprecate the DrupalContentTypeMap model with

rails destroy model DrupalContentTypeMap   
rails g migration DropTableDrupalContentTypeMap  

Now we can remove the references of Map like the stuff in

https://github.com/publiclab/plots2/blob/master/app/views/map/

C. DrupalFiles to Node Body

Now we have to get rid of DrupalFiles and DrupalUploads. This one is fairly straightforward and similar to the previous one.

This involves fetching the files of each node and appending them to the bottom of the content of the respective node. We will create a similar migration file and then write the following code:

Now each node body will be appended by the file path.

After this, we can deprecate DrupalFiles and DrupalUpload, by the following command

rails destroy model DrupalFiles   
rails g migration DropTableDrupalFiles

rails destroy model DrupalUploads   
rails g migration DropTableDrupalUploads  

Now we can remove these files and lines

https://github.com/publiclab/plots2/blob/master/app/models/drupal_file.rb

https://github.com/publiclab/plots2/blob/master/app/models/drupal_upload.rb

https://github.com/publiclab/plots2/blob/master/app/models/node.rb#L79

D. Tag Aliasing

The current Tag Aliasing system in Public Labs is very fragile, has a lot of cumbersome code and is difficult to maintain. Hence, there is a need to deprecate the tag aliasing system and migrate it to automated retagging system.

The basic approach for this would be to use Rails migration script to find the parent tag of each tag, and if it exists, then add that parent tag to all the nodes bearing the original tags. Then, we can manually add all the parent tags to the list of subtags. Finally, we can remove the parent field and all its references

So similarly, we can create a migration file and write the script as:

Now we can manually add the parents tag to the list of subtags here

https://github.com/publiclab/plots2/blob/d7a021d25cca367e1162cea73863c5fe9fe9bd6a/app/models/node.rb#L850-L855

Now for the last step, we have to remove all references to parents and delete the parent field. The column can be removed from the table by running the following command on the terminal

rails generate migration remove_parent_from_term_data parent:string

Few places where references for the parent column can be found are:

https://github.com/publiclab/plots2/blob/ee46fe0adf85ba4b64fca68cd844fa05d8eeaa42/app/controllers/tag_controller.rb#L473-L486

https://github.com/publiclab/plots2/blob/876d0fc084064aaecc23f8003630d7d1ab858fa1/test/unit/node_tag_test.rb#L47-L96

https://github.com/publiclab/plots2/blob/ac9c85e50c042d3922295c49a6047c3f2f591085/test/functional/tag_controller_test.rb#L185-L191

E. DrupalNodeMainImages to Native Image Records

There are several nodes with old drupal image type. We need to deprecate all the DrupalNodeMainImage legacy code and create a new native image record for all those images.

The basic approach for this would be to first find out all the nodes that use the old format image type, then run a migration to create a new native image from the path. Finally deprecate the drupal_main_image model and its references and tests.

There are over 1000 Drupal Image types in the public lab database.

The migration script for the implementation could look like:

Now we can deprecate the DrupalNodeMainImage model and all the DrupalMainImage code like in:

https://github.com/publiclab/plots2/blob/master/app/models/drupal_main_image.rb

https://github.com/publiclab/plots2/blob/master/app/models/node.rb#L309-L327

F. DrupalContentFieldImageGallery to Node

Public Lab has a very old leftover feature that shows images in a gallery on top of a note. The gallery is stored in the DrupalContentFieldImageGallery model. Now, there is a need to deprecate this model, and to ensure that the images can still be displayed, we can just insert the image code on top of the node body. So essentially, each image's HTML code will be inserted on top of the node revision body. We can do this by generating a migration script.

We can implement this by using this migration script:

Now this migration script will put the path of each image in the gallery of the node to the start of the node body. Now, we can deprecate DrupalContentFieldImageGallery model and the table and all its references. We can then delete gallery code from files like node.rb.

Before deprecating all the above models, we should export the database to prevent accidental loss of data, which we can delete if necessary. We can export the database with the simple command like:

sudo mysqldump --add-drop-table -u admin -p`cat /etc/psa/.psa.shadow` dbname > dbname.sql

Timeline/Milestones

Post GSoC

After the completion of the GSoC project, I will continue to be a part of the Public Lab community and the Open Source community. I will devote as much time as I can towards Open Source development to work with creative minds on challenging real-world problems while benefiting millions of users around the world.

Needs

I will require guidance, support and feedback from all the mentors and every one part of the community at Public Lab.

Contributions

I started my open source journey with Public Lab and I believe it is the most welcoming organisation for newcomers. Ever since joining this amazing community, I have made several contributions to Public Lab in the form of Pull requests, issues, helping other contributors, and welcoming the newcomers to Public Lab, and I will continue to contribute to Public Lab in the future.

Links:

FTO issues

Public Lab is a community-driven organisation and it's a very welcoming organisation to the newcomers. The creation of FTO's really helps the beginner contributors to get started with contributing to the organisation and hence it's an integral part of Public Labs. I will try to create several FTO issues for simple tasks in this project. For example, this project involves a lot of removal of model related code after deprecating and migrating the model, and we can create one FTO issue for each section of code to be removed. There are around 15 files from where code removal is necessary, so we can have more than 20 sections of code to be removed. Thus, this results in at least 20 First Timers Only issues that can be created from this project.

Experience

I am very passionate about programming and building new and exciting things. My favourite backend framework is Ruby On Rails and have worked on several projects in them. Along with that, I develop in Django and NodeJS too.

My few of my Notable Projects are:

Library Management System : A library Management System made using Ruby on Rails.

Auction Web application : An Auction application made using Ruby on Rails where users can bid for products posted by other users.

Mailer : Made a Mailer interface using Ruby on Rails, where admin can send mail to selected users or filtered users based on location, etc.

NEAT Implementation : Studied and implemented the NeuroEvolution of Augmenting Topologies to improve AI in games and tested it in a game. Made using Javascript with Flappy Bird game as demo.

NES Emulator : Made a Nintendo Entertainment System Emulator from scratch using C++ and OpenGL.

Accent Classification : Made an Accent classification Application based on Geographical location using Deep learning and web scraping.

DBSCAN Clustering and SINR : Implemented DBSCAN clustering algorithm in ns3 for D2D and UE nodes, and calculation for SINR values of each cluster where each D2D node starts after 1 second.

Multithreaded Chat room web server- made a multithreaded chat room web server using C++ and socket programming.

All projects available on github : https://github.com/gaurav2699

Teamwork

I am a Web developer in IRIS, NITK which is a student run organisation made in Ruby On Rails for developing and expanding a digital portal that ensures that all administrative, academic and alumni related procedures take place methodically in the college. I worked in various modules like Group Management System and Calendar Management System, where my work also included deprecating few models and migrating the data to another model, which is similar to this project.

I interned in Winter 2020 at Ari, which is a startup in NITK Surathkal for bicycle sharing platforms. I designed and implemented the Admin Panel using Firebase and bootstrap.

I have also interned at PSUP, which is also a startup in NITK Surathkal, where I developed Authentication and User models in Django.

Along with that, I am part of the web team of ECell, NITK where I develop and maintain their website.

Passion

I really love what Public Lab has been doing for the environment. I am from New Delhi, India which is one of the most polluted cities in the world, hence I really believe there is a great need for environmental awareness.

Along with that, I really love how welcoming Public Lab is to the newcomers. I have learnt a lot while contributing to Public Lab, and hope to have a great learning experience in the future too.

Audience

This project will mainly help the developers and contributors of Public Lab, as it will make the code less cumbersome and more readable. The drupal models are confusing to the new contributors and hence removing these will help the Public lab codebase be more readable.

Commitment

As I will be having my summer holidays during most of the coding period of GSoC, I will not have any academic work or any other commitments. Thus I can commit most of my time to Public Lab. I can easily devote around 25 hours per week for Public lab. I look forward to contributing and working with Public Lab.


9 Comments

Hey @warren, @cess, @ruthnwaiganjo and all the community members of Public Lab. This is my draft proposal for Legacy Code Deprecation Project for GSoC 2021. There are still lot of changes and details to be done. Would appreciate any feedback. Thanks a lot!

Reply to this comment...


Awesome proposal @gauravsingh2699 🎉 🎉 Love the in depth explanation on how you plan to accomplish each task, the code inclusion, db updates and migrations 🎉 .

We expect some test to fail and some may need to be rewritten during the implementation...Maybe you could allocate time for that in your timeline Thanks for posting

Thanks a lot @cess for the feedback! I allocated time for test rewriting and restructuring in the end. I will preferably try to rewrite the tests associated with each model just after deprecating that particular model.


Reply to this comment...


I want to recognize this proposal's thoroughness, great work! I love that you're basing everything around migrations, which is just excellent. One thing that will come up is that we have a test server with a full copy of the live database on it. After doing local tests of migrations, you can push them to this server at https://unstable.publiclab.org in order to see how they run on a real live (giant) database. However, it's important to be able to push a rollback as well, or to otherwise be able to recover the db state so you can correct errors. For this reason I recommend we think through a) if it's worthwhile to be very careful about making reversible changes in our migrations, with the "up/down" workflow, AND/OR coordinating with our sysadmin (@icarito in chatroom and on GitHub) in order to find a way to recover a db snapshot or otherwise roll back state.

One other thing is that a lot of the migration of Answers to Comments is complete. Your plan for it is just right, however, I think we can skip several steps as all Answers have already been converted and I believe we can now simply delete those records. Small change, but thanks!

Thanks a lot for this detailed plan!

Reply to this comment...


Thanks a lot @warren for this amazing feedback. Yes, I do think that making reversible changes is worthwhile with up/down workflow. It is extremely convenient to just rails db:rollback if migration doesn't happen as expected or we make a mistake. We will definitely face such instances where the migration does not happen as desired by us and thus just doing a rollback will save a lot of time. Worst case scenario, we will have DB snapshots with us, to recover the DB state but for most cases having a up/down workflow will suffice. Talking about the answer to comments migration, I will change the timeline and the implementation to accommodate for the work that is already done in this regard. Thanks again!

Reply to this comment...


Thanks for sharing @gauravsingh2699.

You have mentioned under the title _DrupalContentTypeMap to Nodes_ that,We can instead migrate all the 'map' type nodes to the 'notes' type nodes. However, in the original post(https://publiclab.org/wiki/gsoc-ideas#2021+Ideas) it says merge and combine deprecated content types into standard ones; for example, Maps become Wiki pages. Have you thought about why setting the maps as type notes is better than wiki?

I love the comprehensive research. You can add a section of possible FTOs that can be created from this, I suspect, they'll be a number.

Is this a question? Click here to post it to the Questions page.

Thank you for the review @ruthnwaiganjo. Actually, since its mentioned in this issue https://github.com/publiclab/plots2/issues/4072, and here https://github.com/publiclab/plots2/blob/main/doc/DATA_MODEL.md that map should be migrated to notes type. I had also asked @warren in the chatroom about this and he also clarified that notes type is preferred. I also do think it's better to migrate map content types to notes type than page type because notes type has more similarities with the map content types than the wiki types and thus migration will be easier. Talking about FTO's, I will definitely add a section for that. What made Public Lab great for me was how welcoming it was for a beginner and I will definitely try to help other newcomers to get started off with Public Lab. Thanks a lot again!


Hey @ruthnwaiganjo, I added section for FTOs. Thank you.


Reply to this comment...


Hi @gauravsingh2699, I wanted to thank you again for your excellent proposal. It goes in-depth on the complex set of tasks needed for one of our key proposed ideas, and you worked to integrate our feedback as well. I really don't have very much to suggest and want to reiterate that we simply aren't able to select all the strong proposals we receive each year.

If I had to mention something, it might be that we place a very high value on cooperative skills and teamwork, and that we always love to see applicants thinking about how to post issues which welcome others into the work, specifically the "first-timers-only" issues you can read more about at https://code.publiclab.org. Seeing applicants who apply this set of skills to support others and help build a team is a very positive sign for us.

In any case we hope you'll apply with us again and once again thank you for your time and efforts, for which we're very grateful!

Reply to this comment...


Login to comment.