Show your support for community science. Donate to Public Lab »

Public Lab Research note


GSoC 2020 proposal: Spam Management Dashboard

by keshav_sethi0004 new contributor | March 12, 2020 17:12 12 Mar 17:12 | #23129 | #23129

Name: Keshav Sethi

Email: f20170657@pilani.bits-pilani.ac.in

Github: https://github.com/keshavsethi

Affiliation: Birla Institute of Technology and Sciences, Pilani (Bits Pilani)

Location: Rajasthan, India

Timezone: Indian Standard Time (UTC +05:30)

Project description

Improving the Spam Management Dashboard at Publiclab.org/spam and implementing various features like Bulk moderation, weekly digest, automating tagging of posts, etc.

Abstract/summary

This project focuses on the improvement of the Spam management dashboard https://publiclab.org/spam and makes the user interface much more refined so that it becomes easier for moderators to approve or mark some posts as spam. This will consist of core features like Bulk moderation, Daily/weekly digest, automatic tagging of posts, advanced tables, flagged comments/posts review, My queue review, moderation rules and other relevant information, finding old inactive accounts/spam and removing them and better automatic way of spam detection. All these features will make moderation easier and faster for moderators and improve the quality of content at Public Lab.

Problem

Currently there is no feature of bulk moderation at PL thus it becomes a hectic task for moderators to approve/reject certain posts one by one. Moreover there is no feature for the digest of flagged posts and the moderator always has to visit the portal for moderation. Currently moderation is completely manual and requires a moderator for spam management. Even the task management among moderators is not up to the mark. Old and inactive spam accounts also need moderation. All these problems will be solved by this project related to the spam management dashboard.

Note: All the mockups/prototype and code samples are just for reference. They can be revised/updated according to the requirements of the project, moderator preferences and suggestions given by mentors.

Project Goals and its implementation

(Design is explained in later parts of the proposal)

1. Bulk Moderation:

This will help moderators to approve/ban/spam n number of users/posts/comments in one click. This is supported by manual selection, overall search option where a moderator can search user/tag/comment/word etc, sort and other techniques.

Full page search/specific search and apply bulk action (sample preview)

image description

For this Data Tables which is a plug-in for the jQuery JavaScript library can be used. Here moderator can go for a specific search, regex search, dynamic sorting and dynamic per page view.

For example: A moderator wants to moderate all the flagged posts of a certain user. Here Moderator can search a keyword of a username or any pattern. All the matched rows will appear and moderators can select all the searched rows in one click and take any bulk action.image description

After putting Donate and Stats into spam with bulk moderation(preview)image description

Features of bulk moderation are explained in detail:

  • Body Filter: Filtering in Bulk Moderation is smart as it allows the user to input multiple words (space separated) and will match a row containing those words, even if not in the order that was specified (this allows matching across multiple columns). Here columns of the bulk moderation table include Author, Title, Type of content, Checkbox and an action button. This will filter through all pagination pages and supports regex also.
  • Column sorting: Multiple columns can be ordered at the same time taking one column as reference. Every column will have features of storing in both ascending and descending order. If a moderator wants to sort only wikis then it can be searched then sorted. This feature can help in extracting out relevant information in every possible way.
  • Dynamic pagination: Moderators can change pagination according to their wish for example there are 10 rows per page then it can be changed according to the moderator's wish.
  • State Save: It also has the option of being able to save the state of a table (its paging position, ordering state, etc) so that it can be restored when the user reloads a page, or comes back to the page after visiting a sub-page.
  • Child rows: we can attach various child rows to a parent row in the Data Table. This can be used to show additional information about a row like tags, time of posting, etc.
  • Bulk actions: After filtering and selecting desired posts, the moderator can act with bulk actions i.e. Bulk publish, user ban and marking as spam.image description

Note: All these bulk moderation features can be accessed only by Admins and moderators.

2. Daily/Weekly Digest:

All the posts which have the largest number of flags i.e. based on count will be sent in as a daily digest or weekly digest based on settings of the moderator. This can also be extended later with tagged posts. In that digest moderator can also review all through read more link.

A separate method, e.g. "send_digest(moderator,flag_comment)", where moderator and flagged comments are method arguments corresponding to Public Lab moderators and their subscriptions/digest notes respectively. This method would go in the subscription.rb file. A method is already there in user.rb (content_followed_in_period(start_time,end_time)) which gives us the digest notes. These comments would be passed (as flag_comment) in the 'send_digest' method in the subscription mailer. For this we need a service that can make asynchronous requests, so that there is not much load on the servers. Rails provide a built-in service for handling asynchronous tasks known as "active-job"

For scheduling of digests whenever gem can be used Whenever gem creates a cronjob which calls the job every 24 hours.

Digests provide different interfaces based on the preference of moderators in settings

  1. Digests to remind moderators to come to moderate spam. This interface will contain all the details present in the dashboard as explained in later parts of UI design like pending moderation requests, count split up and following tagged posts moderation requests.
  2. Digests containing all the following tags posts/comments. This interface will contain all the posts which are tagged with tags that are being followed by a moderator. (The Queue)
  3. Digests containing all the flagged tags posts/comments. This interface will contain all the posts which are flagged in order of flag count. (Flag posts)

image descriptionimage description

Note: Moderators will have all the ability to set its frequency and type of interface. These settings will be dynamic as multiple digests can be obtained by a single moderator.

For UI of the digest moderation_digest.html.erb file will contain the template of the email/digest.

image description

3. Automatic tagging of posts:

Comments and wikis can be divided into word and sentences and each word is assigned with a score and according to total criteria each post/comment can be tagged with spam/positive/negative and other tags.

image descriptionimage description

If the score is between the following criteria a tag can be assigned to them.

image description

image description

This is the word.json file where a sample of words are given some score according to their sentiment and nature. This can be updated by moderators in the setting page. This can act like automatic scanning of words in comments/posts where malicious and unwanted words can be automatically scanned and nature can be predicted

Example: Suppose a comment has the content "go for shopping at xyz.tk". Here all the dots, commas, etc will get separated and only words array will be compared with words.json and a total score will be given. As in this example tk has a score of -30 so it is marked as spam. This will help to avoid long readings of posts and comments and detect obvious spam easily. All the words which are unacceptable by PL policy can be given a very high score and a new category can be made to ban authors of that comments/posts automatically.

Note: All the posts will be tagged with some tag like spam etc. These tags will be represented by a different color scheme like spam with red, positive with green and negative with grey. This will help in a better view and understanding of the moderator.

Goal Extension: This goal can be extended in the future by adding a comment classifier or some ML model which is a completely different project. There are various services like Akismet(and other paid services) which can be used for automatic tagging of posts. But before ML, platform for manual spam management is required which is more reliable than any other method.

4. The Queue:

This feature has all the posts where the moderator is mentioned as a tag and that tag must be followed by that moderator(go through the example for better understanding). The tagging of a post is done by the user and my queue will have all the posts where the tag is mentioned which is followed by the moderator. All features of /spam/wiki will be on this page like bulk moderation, ban user, approve, spam, etc. This page will improve task management and make moderation easier.

image description

Moderators can change their tag preference in the settings section where either admin can allot tags or moderators can change its tag.

For example a user mentioned "something_xyz_abc" as a tag in the post/comment. Moderator is also following that tag i.e. "something_xyz_abc". This post/comment will appear in the queue section as this post is followed/allotted by the moderator. This will improve task management and make moderation easier.

Task management: The queue will help in task management. Suppose there are 2 moderators A and B. A is following a few tags like air quality, water, etc and B is following tags like gsoc, tech software, etc. In their queue section A will receive posts related to air, water, etc and A can moderate those in queue section while B can moderate posts related to tech,gsoc, etc. This will help in better management of tasks by growing a moderation team with responsibility sharing features like tag filtered moderation and digest.

Note: This is similar to a subscription page but it can be used for moderation. All the features of Bulk moderation will be there.

5. Flagged post:

Flagged posts are dangerous among all posts thus it requires the maximum attention of moderators. According to the number of counts of flags by users on posts can be listed in the flagged section. All these posts will be sorted based on flag count. This flag feature is used by other sections of PL but not in the moderation tool. This section will have all the features of /spam/wiki like bulk moderation, ban user, approve, spam, etc. Most flagged posts will also be sent to moderators in the form of daily/weekly digest.image description

For this flag_count can be maintained and all the flagged comments/posts can be sorted according to their count in the spam/flag

image description

6. Settings and banned user list:

Daily digest, bulk moderation the features require preferences of moderator/user like digest frequency, pagination on each page, tag followed Apart from this it will also contain a list of banned users with an option to unban them and list of moderators for future reference. This will ease unbanning users and provide users with help in future tagging of moderators.

The section will contain all the moderation rules for better moderation

image description

Sample preview of modal(ban/mod/rules)

This is sample code for following certain tag by moderator for getting my Queue

image description

Policy for banning a user: Users can be banned if the count of spam posts/comments exceeds a certain limit (eg 5) but the moderator has the right to ban or unban any user at any point in time.

7. Insight section for moderators :

In insight section moderator can see the stats on how many posts have been moderated, approved, spam detected etc. This will motivate moderators to improve their stats and motivate users to write quality posts and get the most approved posts. For this Chartkick gem can be used, a click on publish/spam/ban button database of count will be updated and it will be reflected on the chart. The chart will be fully responsive and informative.image description

Overall UI Improvements and Design Details

image description

(This is a preview of a prototype for All posts)

At the left there is a navbar for spam management dashboard with content as follows:

Dashboard: This contains Moderator details where all the basic details like User ID, statistics etc are present. Statistics involves posts(wikis and revisions) moderated, comments moderated, posts/comments pending to moderate, followed tags and count of posts with each tag. This will give an overall idea of daily tasks of moderation to a moderator.

All posts: This page contains all the posts of all users which are banned, marked spam, published, hidden. This is for future reference of posts. This is for moderation at a general level i.e any moderator or admin can see all the posts. This will be used for manual referencing of posts. Example There are thousands of posts published so any moderator can see posts and refer to why some posts are banned and hidden while some are published.

Bulk moderation: This page will contain all the wikis/revisions/comments which are left to moderate(pending posts which are requested to get published), This page will have all the features of bulk moderation as mentioned in its implementation part.

Flagged posts: All the posts which are flagged and they are sorted according to their flag count. (All the features of bulk moderation will also be there)

The queue: This section contains all the posts/comments whose tags are followed by the moderators. (All the features of bulk moderation will also be there)

Wiki: This page contains wiki with all the features of Bulk moderation. This is filtered with only wikis. (All the features of bulk moderation will also be there)

Revision: This page contains only Revisions with all the features of Bulk moderation. This is filtered with Revisions. (All the features of bulk moderation will also be there)

Comments: This page contains only comments with all the features of Bulk moderation. This is filtered with comments. (All the features of bulk moderation will also be there)

Users: This page contains a list of all the users who are banned and who are active. In this page moderator can unban or ban a user. This will also help in future reference. (All the features of bulk moderation will also be there)

Insight: This page contains all statistics of moderators in graphical format as mentioned in its implementation.

Settings: This page contains all the preferences of the moderator like digest preference, follow tags, update inappropriate word list for automatic tagging of posts.

Note:

  1. The navbar on the left can be hidden with a button to widen the area of data table which will increase its visibility
  2. There will be different tabs for all the features like select all, select none, ban user, publish, spam and ban, hide, etc. This will help moderators to do all tasks in one click.

Timeline/Milestones

image description

image description

Needs

I will require guidance from my mentors and suggestions from all the members of public lab which will help me to complete my project.

Contributions

I have been an active member of Public Lab since December 2019. I have done a good number of contributions in the Public lab especially in plots2. My major area of contribution in plots2 is spam management and I have contributed in both frontend and backend. I have received major help from all the mentors and tried to improve my contributions. I have also made a good number of first timmer issues. I have made around 10 commits till now and have created around 15 issues in the public lab. I am currently working on some issues and will continue to work in the future as well.

Links :

Comments

Issues

PR

Experience

I am a 3rd-year student at Bits Pilani and I have been doing web development for 2.5 years. I have done many projects in JavaScript, angular and Ruby. I have done 3 months internship at Sun mobility, Bangalore where I worked on their asset management software and battery tracking system. This software was made in JavaScript and angular7. I have also won the Smart India Hackathon 2019 where I build Tsunami prediction and alert mechanism. This project was built on JavaScript and Django. I have also worked with some startups and worked on their software management. I have also worked on some college projects and college fest websites. I learned Ruby from Udemy(Rob Percival) and made some projects like blogging software, Instagram clone, etc. I have a solid understanding of the syntax of Ruby and stronghold on OOP. I am also familiar with MVC, RESTful and some gems like Chartkick, Resque, etc. I have a proficient understanding of code versioning tools like git etc.

Team Participation

I have participated in various team competitions at different levels. As I have mentioned I worked in a smart network team during my internship and SIH 2019 was also a team event. I have also participated in various college team events. Regarding Public lab I have gained a lot of experience and guidance from every member of public lab especially @bansal_sidharth2996 @VladimirMIkulik @cess @jwarren @nstjean. Working with a community like Public lab is truly an honor for me.

Passion

What about our projects, and Public Lab, interests you? What are you passionate about? Open science, environmental justice?

I really love the concept of Public lab and the work they are doing for the environment. A platform where people can share their ideas and research work related to the environment is really a necessity of today's generation. Even the public lab community motivates me to work towards the environment as they are really hardworking and focused on their goals. Apart from plots2 - map knitter, spectral workbench, kites, balloons and other products are assets to our society. Love for the environment and mother nature inspires me to work for this organization. I believe that this platform needs better content and spam moderation is required for this. It will improve the quality of content and reliability of content as well.

Audience

Whom do you want your work to help? We especially appreciate proposals that make technologies and techniques more welcoming and friendly to those who've often been excluded.

This project will help all the users as it will provide them quality content and no spam/negative comments. I will also help all the moderators and make their tasks easier.

Commitment

Do you understand this is a serious commitment, equivalent to a full-time summer job? Tell us how you'll structure your schedule from day to day!

Yes, I fully understand that this is a serious commitment and I can devote 40-45 hours weekly for the completion of this project.


20 Comments

Those who are reading please give your feedback Thanks:)

Reply to this comment...


Hello Keshav, I left some feedbacks at Google Docs thanks for incorporating them. It is an awesome proposal. I am just thinking about each idea you proposed. Few Suggestions: 1. From the timeline, you can shift the reading documentation of gems in the community bonding period. We totally understand you have end term exams, so you can read them before your end semester exams too. No need to worry in end sem exams. 2. Regarding Bulk Moderation as I suggested previously, please provide the implementation details, like MVC, involved names, which type of sorting filters you will be incorporating, like by title, date of last modification, there will be bulk moderation (a separate web page accessible by only admins/moderators OR under settings ?), etc. 3. Please separate the design details from the implementation details. Example: Bulk Moderation is a high-level design while the Daily Digest is fine grain implementation. Solution: Divide it into 2 parts - Design and implementation. Write just the approach which you want to discuss in the design with er diagram/flow chart/dfd/ui/ux/mock ups etc. While in the implementation details write the core implementation details. Reason for this type of categorical arrangement: Many folks at PL are non-coders so they will be willing to help you, the design part is for them. Developers look at both. 4. Automatic tagging of posts kindly consult Jeff regarding ML involvements here as he is much more experienced in handling multiple repos than me. This is really gigantic thing. I don't think we should include this in GSoC. What approach will you follow if we are ready to incorporate the ML engine and there are a 1% false classification rate? 5. My Queue kindly name the Queue. Secondly, please explain again. Sorry I am not able to understand it completely. 6. Flagged post PL has a variety of posts(research notes, wikis, maps, static pages etc.) Kindly mention specifically what you mean by post. Secondly, flagging is done on posts and comments too. Kindly include comments too. 7. Can you kindly check the behavior of banned user profiles? Example Scenario: X is an intruder at PL. X is initially behaving like a diligent fellow for some weeks. X wrote comments on some profile, initially good ones and then bad ones. Now, Moderator M bans X. Then what will happen to the comments by X (both good and bad ones)? Also, if user Y checks pl/profile/X what will they observe? 8. Kindly break spammers registration and old inactive users there is a plethora of things to discuss. 9. Loved Insight section. Kindly take your time and modify the proposal accordingly. Thanks for such great work at pl and superb proposal!

Is this a question? Click here to post it to the Questions page.

Thank you bhaiya for such a detailed and awesome feedback. I have made some changes in proposal. Can you please review it. Thanks :)


Reply to this comment...


Hi @keshav_sethi0004 , really love your proposal, love the mockups and you included code snippets ❤️ One thin to add on to of what @bansal_sidharth2996 requested, on the contributions sub heading please add links to pull requests and issues you have created on any of our repositories. Thanks

Thank you for your feedback. Links for PRs, issues and comments are added. Thanks : )


Reply to this comment...


This is a great proposal! Echoing @bansal_sidharth2996, maybe Automatic tagging of posts could be held as a "stretch goal" for the end. I think there are some good modules like Askimet and others that could be used to provide "recommendations" for whether new posts are spam or not, and we could show these as icons in the list view to help you decide.

Another way we could do this is we could have a button that says "suggest a categorization" or something which would check the boxes for you, of the list items that seem like spam according to some algorithm, Askimet, machine learning, or whatever, and you could visually review them but otherwise just use bulk actions to mark them all spam.

I'd love to hear a bit about how you'd approach testing, both of the underlying system (functional and unit tests) and of the UI (system tests). This helps your code remain maintainable and readable as well!

Finally, perhaps some periods of discussion with site users is good to factor in so that you know if the systems you are designing are a good fit for people.

Thank you!!

Thanks @warren for your suggestions.


Yes, I am thinking of using Askimet as a stretch goal but as far as automatic tagging is concerned, It is just a scanning of posts/comments where according to community rules and regulations few words which are unacceptable and spam can be scanned and they can be marked with some tag. Suggestion of adding button for filtering can be added where all posts which are marked spam can be selected in bulk moderation. I will surely add it. I will also add some time for discussions with site user in timeline to get feedback and suggestions. Thanks:)


Reply to this comment...


One suggestion I have is to think in two main portions about this project.

The first is the experience of using this system as a non-programmer. It can be really challenging to create new features in a way that is readable, easy to understand and use. I recommend that you try presenting some of these features as visual mockups in the way you might write a blog post to moderators on how to use them. You can highlight "what problems this feature solves for moderators" and keep in mind -- people are most likely to be able to adopt and use designs that they're already familiar with. Maybe good places to look for such "familiar" designs include Gmail's spam management, or other places people online often do this kind of activity?

The second is the technical "plumbing" of these systems. By separating this out, you have an opportunity to build an underlying system that works, and if the initial user interface (UI) is confusing, you don't have to rewrite the whole thing, just change the UI. Also, by separating out the technical planning from the UI, it makes the UI portion easier to read and respond to by non-programmers. Third, this division makes it easier to specify unit and functional tests for as opposed to system tests which may be best for UI.

How does this sound? Could you sort your proposal a bit along these lines? Thank you!

Is this a question? Click here to post it to the Questions page.

Sure, I will separate UI from technical implementation so that it become easier understandable for all. Thanks:)


@warren @bansal_sidharth2996 I have added a new section where i have explained about UI design and overall summary of dashboard in easy one liners. I have made a small prototype to emphasise upon design details so that it become easier for non programmers to get insight of project and its features. Thanks:)


Reply to this comment...


Keshav you can post answers to the queries above whenever you get time. I will suggest you to first reformat the proposal according to jeff and my suggestions so that you can submit it before 31 march. THanks

Reply to this comment...


Kindly update the timeline. Timeline got changed recently

Reply to this comment...


OK, whew, big pile of feedback @emash and I just went through with @liz -- sorry for the length but I hope this is valuable input to help guide your proposal revisions! We've collected up a bunch of info on the broad goals of this project, AND specifics on the most critical portions we'd like to prioritize at the start of the project timeline, namely UI work on the /spam page and features for moderation team growth. We're happy to answer questions on this but we incorporated ideas and feedback on your proposal and I hope this is helpful!!!


Project description and purpose:

  1. "A central & comfortable place for moderators to deal with spam"
  2. Current/prior approach: different places based on preference --
    1. At /spam (ish: didn't have a place to review incoming possible spam until recently)
    2. In-situ as you see spam - (as opposed to, "Now I want to do spam moderation")
    3. In response to email notifications (now much more limited, only after 24 hours have gone by)
    4. Cons: lots of email, inability to be up to date (only from time sent), unreliable
    5. Pros: on-demand as spam occurs, fits into some peoples' workflows
    6. Note on multimodal moderation systems: have been discussions of chat-based moderation, browser notification based moderation... but none very popular.
  3. What is the purpose of doing spam moderation?
  4. Maintain integrity of PL.org content
  5. No inappropriate content
  6. No off-topic or low-value commercial activity
  7. Guidelines at https://publiclab.org/moderation
  8. Need spam moderation to not be a burden
  9. Then we can do outreach to grow the moderation team
  10. Should spam be hidden, or visible so as to recruit moderators
    1. We probably started with visible but became hidden as spam became abusive
    2. Once we "have a good system" we can do better outreach to grow the team
  11. What is the buy-in of a volunteer to become a moderator
  12. Create opportunities for stewardship - minimize actions you need to take to participate - lower barriers

Goals:

  1. Discussion: improve primary UI of /spam and tabs
  2. Reverse chronological sort of as-yet-unsorted posts - "fall through cracks" when an item isn't clearly spam or not, and drops off the front page of /spam results
  3. Pagination
  4. Basic functionality of tabs like Wiki, Revisions, Comments, Active Users
    1. Improved sorting for these as well
  5. and UI system tests
  6. Grow moderation team with responsibility-sharing features: tag-filtered moderation, digests
  7. Ability to be a moderator for only some topics of content: only get comment, post notifications for these tags, although still ABLE to moderate site-wide.
    1. topic/tag based filtering can be done via profile tags, just like digest/notifications preferences (see https://publiclab.org/settings)
  8. Digests - provide multimodal interfaces
    1. +1 digests to remind people to come moderate spam
    2. +1 ability to set frequency daily/weekly/monthly?
    3. digests should be possible for tag-filtered moderation activity
    4. Goal: meet people offsite, reminds them to come to us. Polite: provides flexibility & freedom.
  9. Education features:
  10. "Invites" from educators to bypass moderation (has an issue somewhere)
    1. The invite could be to make an account OR to leave a response (nice bc it achieves 2 things in one step)
  11. "Follow" buttons next to peoples' comments/responses/usernames on your own posts

Thank you so much for your proposal!

Is this a question? Click here to post it to the Questions page.

Thank you so much for this feedback. I will try to incorporate all these goals.


Reply to this comment...


Regarding invites, to expand on this -- would be a way to create a special, relatively short link people could open in a browser (so each student in a classroom could do it) so you can create an account and post a response without needing to be moderated at all. In a sense the teacher "vouches" for people who use this special link.

We'd need:

  1. a format like https://publiclab.org/post?invite=00000 and
  2. a way to generate and display such a link quite large on a screen
  3. would the link expire after a week?

Thanks!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


@keshav_sethi0004 This proposal looks brilliant! I would request you to be a little more detailed in your timeline ie add some time in between for documentation and testing instead of doing it one go as it will keep your project more readable and understandable to you and other contributors!

Thank you so much for your feedback. I have changed the timeline as suggested.


Reply to this comment...


@IshaGupta18 @warren @cess @bansal_sidharth2996 Thank you all for your feedback. I have made some changes as suggested. Kindly review it and give your feedback and suggestions. Thank you :)

Reply to this comment...


Its great. Its good to go...Thanks @keshav_sethi0004

Reply to this comment...


Login to comment.