Public Lab Research note


Outreachy proposal: Tag/topic system refinements

by aliciapaz | April 28, 2021 08:18 28 Apr 08:18 | #26380 | #26380

About me

I'm Alicia Paz Rojas, a self-taught developer from Chile. Before diving into software development, I worked on ecological agriculture research and environmental education.
In 2020 I enrolled a remote software development program where I learned Ruby, Ruby on Rails, and JavaScript while collaborating and pair-programming with students from around the world.
I have a great interest in complex systems, collective action, and art. I also love music, plants, people, and actively dreaming (not necessarily in that order).

Location: Chile

Project description

Abstract

Smoothen the tag system-related user experience by fixing bugs and adding new features, including an interactive category tree for topics and improvements in the Cytoscape view.

Problem

Public Lab's tag system is an important feature that allows users to categorize posts, follow topic-related content, and find users following or working on a certain topic. Thanks to the latter, the tagging system might have an important role in fostering collaboration among users, one of the fundamental purposes of the organization.
Most environmental issues, as the ones covered in Public Lab's platform, are complex and often involve several topics or/and subtopics. Furthermore, many of these topics are often interrelated or nested within each other, which makes the organization of the content both challenging and relevant, in order to make the content easily accessible.
Therefore, the tag and topic system is key for educators and general users to get the most out of Public Lab's website.

The current system has known bugs and feature requests that this project aims to tackle.
For such, the MVP includes the following:

I. Bug fixes

The first part of the project will involve fixing the following high-priority bugs:

I.1. Tab subscription count displays inflated numbers on stats/subscriptions page, as described in #7908.

This fix will require adjustments to the SQL queries driving the stats/subscriptions.

Also, since this bug has been fixed before and then re-opened, improving the tests related to this feature will help to reinforce the fix and quickly catch bugs in the future, if they arise. This improvement includes generating better data for such tests.

I.2. Misbehavior when choosing #ofuses as sorting criteria: tags are supposed to be sorted by the number of posts that used that tag in descending order -and in general terms they do-, but this is misbehaving in several places:
image description

These tags seem randomly sorted, despite being under "order=desc" (see URL).

I.3. The contributors' page (/contributors/tagname) shows users that have not only posted but also commented a note with a certain tag. This is inaccurate since the users appear under the title "People who've posted", but sometimes when clicking on a username, we find that this user has not posted anything under that tag. This should either change to "People who've contributed" or break down the contributor list to "People who've posted" and "People who've commented".

One thing these three bugs (I.1, I.2, and I.3) have in common is that they involve understanding the models and associations between them. I noticed that the only way to get the "big picture" of how the database is organized, is by looking at the Public's Lab Data Model, which is a great place to start but is not detailed enough for someone trying to go deeper. The other option is to directly look at the schema and the models, but for a newcomer, this might be a little tedious and confusing.

To provide an example, let's look at the Active Record Query that is (supposedly) causing the issue in I.1.

image description

To understand the first INNER JOIN we must look a the community_tag table on the schema. But if we want to look at the model to understand the associations with TagSelection, we won't find any "CommunityTag" model, because this table corresponds to the NodeTag model. The same happens with term_data and the Tag model.

Entity relational diagrams (ERD) are a way to visualize a database with great detail, showing the different tables and associations that exist. An ERD is a great tool to improve the readability of a database and help newcomers to get on board quickly.

There are many tools available to quickly create an ERD. Among them, DBML (Database Markup Language) is an open-source DSL language that comes with a free visualization tool called dbdiagram. Since it supports Ruby on Rails integration, the ERD can be created simply by uploading the schema.rb. The result can be edited and seamlessly shared on the same platform, or it can be exported as an image file and added to the wiki of plots2.
As an example, this is a partial ERD of the Public Lab's website database:
For the interactive version, check out this link. Note that when hovering on a table, the associations are highlighted.image description

II. Improvements and feature additions

II. 1 Retroactive deletion of tags added by banned users
This action should delete only the term_node/node_tag (instance), not the term_data (class). The community_tags table contains the entries to be deleted after a user is banned.

This is how I would implement it:image description

It's worth noting that this would be an irreversible change. Even if the user is unbanned, the tags will be irrecoverable. Another way would be to filter these tags in all counts and displays (probably a bit more complicated, but the tags would be recoverable).

II.2 Tag locking

  • There is an implemented "locked" tags for nodes, that allows a moderator or admin to lock a content
  • Only moderators or admins can edit a locked post
  • The functionality can be extended to only allow moderators/admin users to edit, add or delete tags on locked content.image description
  • The UI of the implementation will include a button only visible for moderators and administrators. In pseudo-code:image description

II.3 Prevent first-time-posters to tag work except their own

This is a possible implementation:image description

II.4 Implement a category tree-like interactive topic feature

The current implementation of the topic list at wiki/topics displays a table.
Because of the above-mentioned problem about complexity in environmental topics, some of the sub-topics are repeated among different topics:

image description

In the screenshot above, "oil-and-gas" is an example of a shared sub-topic among different topics. In this case, repetition is responsible for readability and effectiveness in the tag system, but is also using extra space.

For the MVP version of the feature, a simple tree-like feature will be implemented. In pseudo-code:
image description

By adding check-boxes to each toggleable category, a user could easily keep track of how categories are nested (styling will be improved of course).

image description

A further extension of the feature would be to implement something more interactive/graphical like using a filtered version of the Cytoscape visualization to dynamically generate a topic tree.

II.5 Refine display of interrelationship of topics

The "parent: " power tag allows to set a parent wiki page to a page, so it displays as a side card under the label "This is part of:"

For example, a node with the tag "parent: africa" will have this card on the side:

image description

However, there is no way to differentiate this tag from any other tag (power or regular) on the content page, as other tags will appear in the same format (see II.7 and #6593).

To differentiate parent_tags from other regular tags, the first could display under a different label, such as:
image description

II.6 Separate author names with commas instead of "with" on co-authorship author link display on notes

Currently, the coauthorship feature uses the power tag "with", and this conditional statement adds that word between the names (in app/views/notes/show.html.erb):image description

The first problem I encountered while looking at this file is readability.
Some lines are very long and conditional statements are written in a single line, which makes it difficult to understand and modify.
As a part of this fix, I propose refactoring this file a little bit (which would make a nice FTO).

II.7 Finalize display of tags and topics in the sidebar using small cards

This feature is currently in development in #8684 and it's been already broken down into smaller tasks on #6593.
The main tasks on this feature would be to:

  • refine the styling of the implementation
  • make the tag display for JS/Ajax tags to be a card (as stated in the PR)
  • revise the integration tests to ensure proper behavior in the production environment before the release

II.8 Improve Cytoscape visualization based on community input

  • In full-screen mode is not possible to select the number of tags that are displayed, like in regular mode:

image description

  • In full-screen mode, some tag names are still very small for someone with reduced vision.
  • Many other feature ideas and fixes for this are discussed in #1502. A good way to work on this would be to re-activate the discussion and together decide which features would be most helpful to implement.

Needs

  • I will use the constant feedback from my mentors and the Public Lab GitHub community
  • To implement the Cytoscape improvements, I will require studying the documentation of this tool

Timeline

May 24 - May 30, 2021

  • Make changes to the internship project based on feedback from mentors*
  • Implement ERD for database friendliness
  • Create FTO issue to update README with these changes
  • Write a blog post (Outreachy's prompt): "Introduce yourself" or a different topic agreed with mentors

May 31 - June 6, 2021

  • Write tests for stats_controller, tag_controller, and any other relevant file
  • Fix bugs on topics/tag system
  • Create fixes-related FTO's when possible
  • Write a blog post to document the process

June 7 - June 13, 2021

  • Write tests for the tag-locking system fixes
  • Implement tag-locking system, breaking it down into smaller issues and creating FTO's when possible
  • Write integration tests for the new feature to ensure appropriate behavior in the production environment
  • Write a blog post (Outreachy's prompt): "Everybody struggles" or a different topic agreed with mentors

June 14 - June 20, 2021

  • Write tests for deletion of tags added by banned users
  • Implement retroactive deletion of tags added by banned users
  • Create a FTO related to these changes or any other suitable area
  • Write a blog post to document the process

June 21 - June 27, 2021

  • Write tests for category topic tree-like interactive feature
  • Implement feature on its MVP version, breaking it down into smaller issues and creating FTO's when possible
  • Write a blog entry (Outreachy's prompt): "Think about your audience" or a different topic agreed with mentors

June 28 - July 4, 2021

  • Write integration tests for the new category tree feature to ensure appropriate behavior in the production environment
  • Open or reactivate GitHub issue to discuss Cytoscape improvements
  • Write a blog entry to document the process

July 5 - July 11, 2021

  • Refine display of interrelationship topics using parent tags, breaking it down into smaller issues and creating FTO's when possible
  • Write a blog entry (Outreachy's prompt): Mid-point project progress or a different topic agreed with mentors

July 12 - July 18, 2021

  • Write integration tests to ensure the expected behavior of the changes in the production environment
  • Write a blog post to document the process
  • Create a related FTO if possible, or use another suitable topic

July 19 - July 25, 2021

  • Separate author names with commas instead of "with" on co-authorship author link display
  • Implement card-like display for tags/topics in the sidebar, breaking it into smaller issues and creating FTO's when possible
  • Write integration tests to ensure appropriate behavior in the production environment
  • Write a blog entry (Outreachy's prompt): "Career opportunities" or a different topic agreed with mentors

July 26 - August 1, 2021

  • Start implementing Cytoscape view improvements as discussed with the community, breaking it down into smaller tasks and creating FTO when possible
  • Write integration tests to ensure appropriate behavior in the production environment
  • Write a blog post to document the process

August 2 - August 8, 2021

  • Continue improvements of Cytoscape view
  • Write tests for these changes
  • Write a blog post to document the process
  • Work on Outreachy's prompt: resume

August 9 - August 15, 2021

  • Start implementing the extension of the category tree-like feature based on community feedback, breaking it down into smaller issues and creating FTO's when possible
  • Write tests for the introduced changes
  • Write a blog post to document the process

August 16 - August 22, 2021

  • Buffer week to complete possible pending tasks
  • Write final project progress blog post, highlighting achievements, pending tasks and further work

August 23 - August 24, 2021

  • Final feedback

* This feedback/improvement actually starts right after I publish this proposal, but this is when the internship officially starts.

** Blogging is going to be a key activity towards the documentation of the process. A post of this kind might cover a specific task of the week (i.e. something I struggled with), relevant thoughts, or pretty much anything I would like to share about that week's experience.

First-time contribution

As a newcomer in open source and the Public Lab community, I've been contributing in small but purposeful ways, trying to do my best and asking for help when I need it:


Experience

I've been coding for approximately a year. I started with free and open source projects like The Odin Project, and in November 2020 I joined a remote school to learn full-stack development, where I've been pair-programming and collaboratively learning with developers from Asia, Africa, Latin America, North America and Europe. I started with HTML/CSS, and then learned Ruby, Ruby on Rails and JavaScript.
Most of my GitHub projects have been developed for learning purposes, but I'm proud and happy to share some of them, check out my pinned projects here!.

Teamwork

I enjoy working in horizontal and collaborative teams, where learning can be a multi-directional process. I like learning from others as much as helping others to learn and accomplish their goals.

During my studies of software development, I have successfully completed +15 collaborative projects, by pair programming with developers from other parts of the world.

I volunteered in a community orchard for 4 years, occupying different roles within the organization, such as community coordinator, facilitator/educator, and funding coordinator. During that time we successfully developed an environmental education and urban agriculture program, that involved more than 300 neighbors, students, and educators from the community.

I was a part of a music collective for 4 years, where I performed as a multi-instrumentalist and facilitator. During that time, I taught myself (with the help and constant feedback of my peers) to play several instruments, and shared my knowledge with other participants as an electric bass teacher.


Passion

I'm passionate about education, collective action, and applied ecology. I believe that knowledge plays a major role in changing our immediate reality, and should be available for anyone, always.

As an "environmental science person" diving into technology, I think that Public Lab is an awesome way of converging my interests in technology, creativity, and environmental issues.

What I love about Public Lab is that the principles that "rule" the organization -such as diversity, collaboration, and "open-sourceness" - are very tangible across the entire workflow. I felt very welcomed when I first started contributing, as everyone is willing to help each other and share their perspective on things.

Audience

Educators from communities facing environmental injustice are the major target group of this project. By implementing the described improvements, I aim to help them find relevant resources more efficiently and get in touch with other educators and users working on similar issues, thus fostering collaboration between them.

Another target group is conformed by the potential contributors and newcomers that wish to get involved with Public Lab's open source community. Many of these contributors (such as myself) don't have degrees in CS or lack advanced skills, so I aim to help them get on board by creating FTO's on a regular basis and improving the readability of the source code.

Commitment

I am ready to take a full-time commitment to the internship. I usually work better during the day, from 9 am to 6 pm.




8 Comments

Also, since this bug has been fixed before and then re-opened, improving the tests related to this feature will help to reinforce the fix and quickly catch bugs in the future, if they arise. This improvement includes generating better data for such tests.

This is really a great observation. We've been really challenged to reproduce the error except using our testing servers like http://unstable.publiclab.org/ and http://stable.publiclab.org/, with a full copy of the live database. I'm really not sure what the issue is and to complicate things there are small differences in the databases we use and their versions and configuration. Ironing this all out may help!

mis-sorted tags by uses

Here you're totally right, good eye! The query is pretty complex. We cache the count, but probably not consistently enough. Here again tests may help!

Super impressed and appreciative of your database mapping approach and agree with this:

...involve understanding the models and associations between them. I noticed that the only way to get the "big picture" of how the database is organized, is by looking at the Public's Lab Data Model, which is a great place to start but is not detailed enough for someone trying to go deeper.

Great observations and direction here.

A further extension of the feature would be to implement something more interactive/graphical like using a filtered version of the Cytoscape visualization to dynamically generate a topic tree.

This is great and i appreciate that you're prioritizing the basic version and imagining a more expanded "next step"! Good project organization.

Great proposal! I'm curious, what areas do you have questions about? What is the part of the project you're most excited about?

Thank you!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Thanks for your observations and feedback!

This my first project of this sort (like planning what I am going to code in the next X weeks, being X greater than 2), so I'm a bit nervous about being able to complete the tasks by the estimated time. In this regard, what do you think of the timeline? Is it realistic? Is there anything that could be improved to make it more realistic?

About your second question, I'm really excited about the Cytoscape improvements! Complex networks always arouse my curiosity and I think there is a lot of room for creativity and collaboration in that feature. I also think it will be challenging, that's why I put it towards the end, so I will have time to study the documentation and get familiar with the tool.

Is this a question? Click here to post it to the Questions page.

I think it's good to be cautious about timelines. My ideal is to put the most urgent issues first, unless that disturbs some important sequencing. For example, @noi5e would perhaps have had an easier time doing a React project first, because it replaces a lot of code. But, we knew ahead of time that it would be the "riskiest" and most experimental -- and also least urgent -- part of the project. So we worked together to place it last, and did testing and bugfixing first, followed by a deeper structural rewrite. Finally, React happened last.

https://publiclab.org/notes/author/noi5e

I like what you're saying about Cytoscape and i think your instinct is right to put it towards the end!


Reply to this comment...


This is a very informative proposal @aliciapaz, I hope you have submitted it or are planning to submit it in the final proposal for Outreachy on their website before the deadline on Monday 4pm UTC.

I love the tool you have shared to visualize the database, it's pretty good. I also like the in-depth research you have done in each section.

I haven't seen what kind of tests you plan on writing? Unit, feature, system?

It is also good to think about what aspects of your tasks can be improved for user accessibility.

For mapping the tags using a checkbox, is there a reason a checkbox is better than say maybe a drop-down? Is a checkbox what is used on displaying sub-items in a list of items?

Also, this is something just to think about, for the deletion of tags by banned users, I like the method you have shared, but what about the ones already existing in the system, how can we address that?

Thank you, and all the best in your application!

Is this a question? Click here to post it to the Questions page.

Hi @ruthnwaiganjo, thanks for your detailed feedback!

I'm planning to write both unit and integration / functional tests (I'm not sure I really understand the difference between the latter two...as far as I know, integration ensures that the pieces work well together, while functional looks at the final result in the production environment, without caring about how that result is produced). I think that unit and integration tests will be especially helpful for the bug fixes, while integration and functional tests will be most important for the new features.

About accessibility, I'll definitely give it a deeper thought but this is what I can think of now: - Using semantic HTML tags - ensuring that all buttons and other UI elements are accessible through the keyboard and on touch devices. - Ensuring that if an element changes dynamically, a visually impaired person will be able to know about it (particularly important for the Cytoscape view!) - Using accessible color combinations.

About the checkbox, to be honest, I saw it in an example when gathering references and I liked it, but I don't think it is necessarily better than just a regular dropdown like you say, which will probably do the job just fine. Perhaps checkboxes can cause confusion, which gives me reason to use a normal dropdown instead.

And the last thing, I believe the query wouldn't be so different from the one described. This is what I would do: - Query the user table to find the id's of all banned users. Using 'pluck' we could get these ids in the form of an array. - Query the community_tag table to find all tags where the uid is in the array returned above.

I would need to read some documentation about how to do this on the production database without putting the rest of the data at risk.


Reply to this comment...


This is a really exciting proposal!

Reply to this comment...


Now that I've read SYSTEM_TESTS.md in the project, that is what I meant by "integration tests"! (using Capybara to test the UX and JavaScript).

Reply to this comment...


Hi @aliciapaz - I wanted to write to say that we really didn't see any shortcomings in your proposal at all, and would have been very happy to work with you. We simply don't have funding or capacity to work with all our picks, and we also can't accept too many people for the same project. Please know that your work and your contributions were really fantastic and we wish you the very best. We hope you'll consider re-applying next round and we would be grateful for another chance to work together. Please reach out if you have any questions.

Reply to this comment...


Login to comment.