Public Lab Research note


GSoC proposal: Sensor data upload and display library

by IshaGupta18 | | 216 views | 21 comments |

Read more: publiclab.org/n/18463


About Me

Name: Isha Gupta

Affiliation: Indraprastha Institute of Information Technology, Delhi

Location: Delhi, India

Project Description

Abstract/ Summary (<20 words)

Sensor data upload and display library

Problem

For programmers and non-programmers in the environmental work field, like Public Lab, managing data isn’t a very simple task, more problems to the non-programmers. To make it easier for them to manage data and make it more conclusive, a JavaScript + HTML based library should be developed to organize CSV sensor data into informative graphics like charts. The library will be built and used as a feature on Public Lab but will remain as a standalone tool to be integrable. It would be developed separately like the Editor (https://github.com/publiclab/PublicLab.Editor) or Inline Markdown Editor (https://github.com/publiclab/inline-markdown-editor).

Index

  1. Description of features
  2. Flow
  3. Part 1: Uploading CSV file through drag-and-drop
  4. Part 2: Getting the uploaded file from input field
  5. Part 3: Parsing
  6. Part 4: Displaying Sample data and getting columns selected
  7. Part 5: Getting ALL the data
  8. Plotting the Graphs
  9. Browsable Time Slider
  10. Displaying per-user data
  11. Exporting CSV files to Google Sheets
  12. Plotting Multiple Graphs from within the same sheet (using different columns' combination)
  13. Report Mailer
  14. Creating charts from previously uploaded files
  15. Saving Chart as an Image
  16. Timeline
  17. More About Me

Description of Features:

  1. A simple menu for choosing the graph type Different graph types can be displayed, with the best-suited color theme and graph types like:
    • Bar
    • Pie
    • Line
    • Radar
    • Doughnut
    • Scatter
  2. Display the CSV data’s columns and give the user the option to choose the columns for which the graph should be plotted.
    • A CSV file is usually very cluttered and not all columns out of it are useful.
    • The column headers with a couple of leading values for the user to select the respective columns for X and Y axes to plot the data.
    • The labels for the axes will be picked up from column header titles.
    • The options given to the user should be simple and not too elaborate, for the purpose is just plotting the data on an environmental platform and too many features may drive the user away. This means that with some basic selections, the user should be presented with a graph.
  3. A browsable/movable slider to display data in a graph:
    • During a particular time period (if time is one of the quantities)
    • In a particular range of values scaled according to the range of data.
    • Selecting the particular range or setting the slider value will magnify the graph for that region, giving a sharper look at the plot.
  4. A single click that converts the uploaded file into a Google Sheet and opens it in a new tab available for export. (using Google Sheets API importdata function)
  5. Displaying per-user data:
    • A per-user record table showing a list of all CSVs uploaded by him/her in a separate page on Public Lab. Link <publiclab.org/data/CSVs>
    • The record table will have a Download link so that the user can download the uploaded CSV file(s).
    • The user can also delete a CSV file from his record table.
    • The corresponding column could have the link to the plot plotted from that CSV.
  6. The user can plot multiple graphs for the CSV file on the same page, using an “Add Graph" button. This would help facilitate better analysis and comparisons between different data sets.
  7. A mailer feature. The email will have the image form of the created chart and also the link to the exported Google sheet. On clicking on Mail Report button, this mail will be sent to the user. This gives them a good encapsulation of the analysed data and gives a report like format for them to use.
  8. The user can also create charts from files they have uploaded before. At the time of plotting, they will be given an option to create charts from their previously uploaded CSVs. This promotes reusability of data.
  9. Each graph plotted is given an option to be saved as an image. This makes it very easy for the user to have the analysed data in an easy and accessible form.

Implementation

Flow:

  1. Upload the CSV file through a form submission for the currently logged in user (AJAX).

    • The file is uploaded and saved asynchronously the current user.
    • This file is saved in one-to-many (since one user can upload multiple files for multiple graphs from his/her account) relationships table for the user.
    • This file is saved in the bucket in the database of the server using ActiveStorage.
  2. Get the uploaded file and parse:

    • The uploaded file is retrieved from the form and parsed using Papa Parse at the client side.
    • The uploaded file can be at the client’s system or it can be a remote file, both are good to be handled by Papa Parse.
  3. A sample of the data is displayed from the file

    • We want to get the column header names and some sample data to display and the user to choose the columns which will be used for plotting a graph.
    • We display a sample of the data (say first 10 rows) for selecting the columns.
  4. Getting the names of required columns and type of graph

    • We now get the names of the required columns for plotting from the user through checkboxes and radio buttons.
    • The user selects the axes of the columns.
    • The user selects the type of graph to be plotted from a graph menu.
  5. Data is compiled

    • The selected columns complete data, along with the graph types is combined in one JSON hash.
    • This hash is made available to chart.js for plotting
  6. Chart.js plots the graph with the given data and finally displays it!

Part 1: Uploading CSV file through drag-and-drop

  • We make an input type=”file” which facilitates drag-drop-feature as well.
  • The form will be designed in such a way for the user to conveniently upload the file or drag and drop it there.
  • In Plots2, we use Paperclip gem for uploading and saving files.
  • So once the file is dropped in the drop-zone for uploading and the upload button is clicked, it is saved against the user asynchronously in the table created(see below). On clicking the Upload CSV button:
  • (https://github.com/publiclab/plots2/pull/4538) :
  • An AJAX request is sent to the controller action that would save the uploaded file against the user.
  • A success value is returned to show that the action has been successfully executed.
  • I have done something like the uploading in one of my PRs (https://github.com/publiclab/plots2/pull/4538) :

Part 2: Getting the uploaded file from input field

  • The uploaded file is extracted from the drag-drop input field and is made available for the parsing function\

Part 3: Parsing

  • We will be using Papa Parse, which is very powerful JavaScript library that parses CSV files at the client-side (ie in the user's browser)
  • To include Papa Parse, we'll use it's CDN: https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.6.3/papaparse.js
  • We'll keep download: true for remote and local files to be accessible to Papa Parse.
  • It can handle large sized files. To do this, we will be traversing the file row-by-row so that the entire file is not loaded into the memory in one go and the browser doesn't crash.
  • We can also enable a worker thread, just to be on the precautionary side.

worker : true

  • We will enable dynamicTyping: true  so that all data-types of the file data can be taken into account and kept as it is, not all of them as strings.

  • To get the header names, we can apply a technique:

    • The first row, if it contains the header names, then all the elements in that row will be of the same data type (mostly string).

    • If that doesn't happen, we assign dummy column names like col1,col2 etc.

Part 4: Displaying Sample data and getting columns selected

  • We will display some sample data to the user and ask him/her for the columns to be selected, for which a graph will be plotted.

  • Since for X axis, only a single column can be selected, that will be implemented through radio buttons.

  • For Y axis, multiple columns can be selected, so this will be implemented through checkbox selection.

  • The graph type will be selected by the user through a graph menu (described later)

(When labels for X-axis to be selected)\

(When labels for Y-axis to be selected)

  • We now need all the data to plot the graph

Part 5: Getting ALL the data

  • We have the column names for which we have to plot the charts and we have the graph type. We now need to get all the data (not just sample) for those columns and compile it into a hash form, so that chart.js can easily extract the information and do the plotting.

  • To do this, we just have to pick out the required columns from the parsed data.

  • The JSON data will be of the following format:

  • X-Axis has one array of values or labels.

  • Multiple data sets can be plotted using chart.js so for the Y-Axis, the user can select multiple columns and they will be taken as arrays of values.

  • With all the data in our hand, we can now give it to Chart.js for plotting.

Part 6: Plotting and displaying the graph!

  • With the final, parsed data, we finally use Chart.js to plot the graph and display it to the user!

Plotting the Graphs

Moving to the more graphics part of the project, we now see how exactly we'll be plotting our graphs.

For this, we'll be using Chart.js, a javascript library designated especially for graph plotting.

We can use the CDN available for chart.js:

<https://cdnjs.cloudflare.com/ajax/libs/Chart.js/2.7.3/Chart.js>

We will be using this version of CDN because moment.js is already included in plots2, so we don't need to use the bundled version, because it will reduce the page loading speed.

Including it as script tag:\

It has a variety of different configurations for displaying data through graphs and offers numerous graph types for the user to choose from.

Simple working in correspondence to the data we have:

  • We will get the JSON data in a compiled form after parsing with labels for x-axis as values and set(s) of values for y-axis as values and the label name say "dataset1" as keys for y-axis values.

  • The JSON will also have the type of graph to be plotted, which the user selects from the graph menu. The Graph Menu will provide basic graph types, with some sub-categories of each graph type that will be useful for the users and easy for them to understand.

  • We'll be using simple chart.js documentation code https://github.com/chartjs/Chart.js and design it according to our needs and what all things to include while plotting the graph like:

    • Color Scheme

    • Tooltips

    • Information in tooltips

    • Labels on Axes

    • Legends

  • The graphs can be made responsive by wrapping the "canvas" element in a container div and setting it's position as relative.

  • Below is the skeletal script for plotting function of graphs. The options part is for the design and will be written tailor-made for our site!

Different kinds of design options

This is how the menu will look like:

  • On clicking on a graph of a type, that type will be selected.

  • With all the information in hand (ie Column names for the axes and the graph type), we would finally plot the graph.

    Browsable Time Slider

  • A time slider or a range slider is used to select a particular range of data and zoom in to the graph in that range.

  • It is typically useful in CSV files where the data size is usually large and a slider helps in giving a better insight into the graphical analysis.

It can be implemented through a plug-in designed for chart.js library which is easy to use and integrate with our Sensor data library.

Github Link: <https://github.com/schme16/Chart.js-RangeSlider/>

Obviously, the design will be changed and twerked up in accordance with the design scheme of the charts.

Here's how it looks in functioning :

Changing ranges:

To integrate into the graph-plotting JS file, we simply need to create a new class:

Here a and b would be the the values we initially want the slider to be set to, which should logically be the extreme values of the dataset, as initially we would be displaying all the data in the range and the user can adjust it according to their view frame.

The code will not be used directly and will be implemented separately so that no extra bugs are introduced and it is tailor-made according to our library.

These dependencies are needed to be resolved, only the noUISlider is the new one here:

  • jQuery

  • Chart.js

  • noUISlider

    Displaying per-user data

Link: <[https://publiclab.org/data/](https://publiclab.org/data/_____)username>

  • This page will display a list of all the CSV files uploaded by a particular user and a link for them to be downloaded.

  • The files will be stored in a bucket in the server using ActiveStorage and will be retrieved from there.

  • For doing this, we will be implementing a one-to-many relationship between a user and CSV files (one user can have multiple files).

  • So a user: has_many files and a file belongs_to a user.

(foreign_key is to link both the tables)

So we will create a new table called File which will have 4 columns, name:

  1. File Id

  2. User Id (the user to which the file belongs)

  3. File Name

  4. Path/location of the file in the server

To generate the table:

rails generate model CSVFile name:string, user_id:integer, path:string

The migration would look something like this:

Storage:\

  • There will be a "Download File" button against the name of the file and on clicking on it:

    • A function will be written in the user model file and in the function, the corresponding file name will be searched in the relationship table, something like the profile_image column:

    • The file path will be returned by this function and will be put in the "Download File" button's href parameter and will be made available for download.\

Exporting CSV files to Google Sheets

  • Google Sheets API helps in creating CSV files to Google Sheets and exporting them.\

Flow:\

  • Here, we can export the CSV file data to a Google Sheet and open the newly created sheet through the API in a new tab.

  • This is because, since it is a Google Sheet, the exported sheet will require the user to login with their Google accounts (most of the users have a Google account) and then they can view the sheet and edit it as well.

  • To create a new sheet through API:\

  • To import the CSV data into the file we will use IMPORTDATA(url) function.

  • The url will be the path of our CSV file stored in our server.\

  • When we create a sheet, an instance of the spreadsheet is returned as the response, and that JSON contains 'spreadsheetUrl' parameter which has the url of the spreadsheet. We simply open the created sheet in a new tab.

Plotting Multiple Graphs from within the same sheet (using different columns' combination)

  • To facilitate better analysis, we can provide multiple graph plotting on a single page, something like this, although I think the user will prefer similar kind of graphs for comparison purposes. This would allow the user to compare graphs easily as there are in the same page.

  • This would also enable them to create a report-like format, which can be used later.

  • Once the file is parsed and we have displayed the columns for selection, we display a "Add Graph" button next to our sample data table. Whenever the user clicks on it, number of graphs (n) is incremented by 1 and the user then selects the dataset and graph type for the next graph.

  • We get a different hash for each selected column set, we will just have to run that loop the number of times selected by the user. This will be light to do, since we just need to pick out the columns that are already parsed.

  • We will dynamically create canvas elements using JavaScript and run that loop n times to create n multiple graphs!

 

Report Mailer

  • To facilitate use of the charts plotted by the user, we can implement a feature through which the users can mail themselves the charts as an image and the link to the Google Sheet (optional to be exported).

  • This would enable them to use this data elsewhere conveniently.

  • To do this, we will be using sidekiq gem, which is already being used in Plots2.

  • As soon as Mail Report button is clicked, the Sensor Data library triggers the controller action by using an HTML element's id. The library will also return the path or src of the image of the chart and the link to the Google Sheet which the API returns at the time of creation. Something like this has been implemented in PL editor:

And applied as an id to a div here:

  • The controller then calls the sidekiq job that performs the mailing asynchronously

  • The sidekiq job calls the model function, which will establish authentication and all related variables.

  • The model calls the mailer function, which will ultimately send the email.

  • send_report is the function inside our mailer, which will specify the attachments and render the email view by the same name.

  • A view by the same name ie send_report.html.erb will contain the link to the Google Sheet and the rest of the mail design!

Creating charts from previously uploaded files

  • For the ease of the user and to promote reusability, the user can plot charts from the CSV files they uploaded previously.

  • These files are the ones that are saved against the user when they first uploaded them.

  • On clicking on the above shown button, a modal containing the list of uploaded files is displayed and the user can "use" that file.

  • After clicking on "use" against a file:

  • A request is made to the server to get the path of the corresponding file (in the same way, when it was made to fetch per user data).

  • The file's path is then sent to the parsing function for parsing.

  • After the modal closes, the normal flow of showing the sample data continues.

Saving Chart as an Image

  • We can save the chart as JPEG image, for the user to download it as a copy.

  • We can convert any HTML canvas element to an image format using JavaScript's toDataURL() function.

  • This provides the link to the image file that can be downloaded.

  • This url is set as the href attribute of button and on clicking on it, the chart is saved as an image on the user's machine.

Timeline

Ps: At the end of each week, I'll be creating at least 2 FTOs to welcome new contributors!

Needs

I would be needing the guidance of mentors and I am open to any kind of help or input from other contributors.

Contributions to Public Lab

I have been actively contributing to Public Lab since December and I have worked on different types of issues, from minor to major and involving different aspects of the project (frontend and backend both). I am an active member of the community and have helped my fellow members with their issues and also opened some first-timers-only issues to welcome new contributors. I have made about 16 commits to the plots2 repository and I am well-versed with the codebase. I have also tried to expand MapKnitter to welcome new contributors and am in a discussion on some projects there as well.

I am working on some issues currently as well and hope to continue to do so!

Here are the issues reported by me:

https://github.com/publiclab/plots2/issues?utf8=%E2%9C%93&q=is%3Aissue+author%3AIshaGupta18

Here are my merged PRs:

https://github.com/publiclab/plots2/commits?author=IshaGupta18

Weekly-Checkin opened by me:

https://github.com/publiclab/plots2/issues/4844

Some activity on Mapknitter:

https://github.com/publiclab/mapknitter/issues/338

https://github.com/publiclab/mapknitter/issues/327

Experience

I am a first-year undergrad pursuing CSE at my college.

I am a part of the core team of Google Developer Student Club, which trains young developers on mobile and web development, and is an excellent platform to organize community workshops and showcases, apart from learning new technologies.

I have worked with Ruby on Rails for about 7 months now, which I learned at Coding Ninjas (https://codingninjas.in/courses/classroom-web-development-course) , which has taught me RoR from scratch so I am well-versed with the platform. I am also comfortable with Data structures and algorithms and am fluent in C++, Java, and Python.

Apart from this, at the front end, I have worked with and confident in HTML, CSS, JavaScript, and Bootstrap.

I am a member of the development club of my college, Byld where I get to organize and attend sessions about upcoming technologies.

In school, I was part of the cyber society, and have participated in numerous inter-school competitions.

Teamwork

I have worked in several group projects in my school and college, especially on IoT and on environmental projects in school called "Vasudha" (meaning Earth in English) every alternate year that dealt with a specific type of environmental problems like the issue of non-biodegradable poly bags, global warming, waste management etc. and required us to conduct surveys, analyze the results and propose solutions.

Apart from this, in open source, I have contributed to Public Lab only which has shown me how powerful a community is and how helpful people are. I felt so motivated when I my peers helped me in merging my PRs and I felt so good in helping them and welcoming newcomers to work out their solutions.

I get self-motivated whenever a bug is resolved and I move towards completion of some task I am doing. The best feeling is however, when one of my PRs gets merged. I feel confident about the fact that something I coded is being used somewhere.

Passion

Living in one of the most polluted cities in India, and pollution, which has always given me the sneezes through my childhood, Public Lab very much gets me interested in making the world a much cleaner place.

I am passionate about solving real-life problems, things that help people, even in the tiniest possible way.

Audience

Even in this fast moving world, there are people who still are not very comfortable with using computers to their best use. Through this project, I want to target the non-programmers in making use of the data they have and turning it into something more conclusive, useful and informative, with ease and without losing focus on their task and getting the analysis part done quicker.

Commitment

I completely understand that proceeding forward with this project and completing it is a very serious commitment and I will be fully dedicated to completing it because I am very excited in creating something new and useful and would be more than happy to work on it!


21 Comments

Hey, @warren @bansal_sidharth2996 this is my draft for GSoC'19 proposal. I have a lot more to do here, more designs and ideas to come in, but in the meanwhile, I would love to have your review on this! Thanks a lot!

Reply to this comment...


Please pardon my formatting, I don't understand how that is happening.

Reply to this comment...


Great proposal. I really love your workflow Isha.

Reply to this comment...


Please include some time for fto, bugs and documentation in the timeline.

Reply to this comment...


Thanks a lot @bansal_sidharth2996 ! I have kept the last week ie 18th to 24th August for that only. I could include some time for FTOs in the middle of the period, around Phase 2 I guess?

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hi, Isha! There's a lot here, thank you, what a deeply thought-out proposal!

I have a few suggestions for starters -- first, what if we made this a pure JS library? So we basically handle CSVs as we do currently, which is that you can drag them in, and they display a graph. But all the code could be in JS to be run on the CSV in the browser! We could save the state (i.e. the choice of graph type, default time bounds, etc) in a string, kind of like an inline power tag: https://publiclab.org/n/15582

OR we could store the settings in the URL, as we do in Image Sequencer: https://sequencer.publiclab.org/examples/#steps=ndvi,colormap

This would modularize a lot of code away from the back-end in a clean way, and the library could be usable by other projects as well. What do you think?

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Thanks a lot @warren! I actually thought a lot about this, but then I realized that CSV files are quite big and heavy and parsing them on the front-end, at the client side would put an unnecessary load on the clients browser. So a heavy task like this could be handled very easily by the server. Besides, we would be needing to write the file-user logic there only, so this could be handled together with it. I am open to doing what you suggested as well. What do you think?

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Isha please read the guidelines for Gsoc and outreachy applications. There are a conditional way by which you can participate in both program one after other or attempt only one of both of them through the life. Choose your decision wisely. We are open for both programs.

Reply to this comment...


Yes @bansal_sidharth2996 absolutely! Will definitely do so!

Reply to this comment...


Hi @IshaGupta18 ! Very nice and complete proposal, congratulations! Depending on what happens, I would be glad to help you out as a mentor. The part of adding a Browsable Time Slider is very cool. About what @warren said, I believe making it a pure JS library might be interesting because of the potential reusability, perhaps integrating it with Image Sequencer, the Editor and others in the future. As you mentioned, it might be a quite heavy task, but as far as I know currently we don't have much trouble with excessive load on client-side. If it happens, we can always move it to the server-side with few changes. That said, I think both implementations can be equally successful. Thanks!

Reply to this comment...


Thanks a lot @IgorWilbert Yes I think you are right, we can explore more on this. When I saw PL Editor's code, it was very well written however a lot of it involved using plots2, so I thought that we could make good use of plots2 here, by putting the load on the server. But I am very open to exploring the client side strategy as well, since there are some good CSV parsing JS libraries as well!

Reply to this comment...


Parsing the CSV on the Server side is not a good idea. Suppose there are 100 or 500 or more users who are using this new feature at a time. Then parsing of these 100csv would make lots of loads on the server instead we have a distributed load on the client computers in the client side. And also this will not only create load on the server but it will also have delay response to the user. Let suppose there are 500 users who have submitted the parsing request simultaneously then the last csv's user has to wait for all the csv to parse which will cause in the delay in response. Also, we should not depend upon the rails for it. We should build a standard independent js library which can directly be included in any project via CDN or npm modules or yarn. And can be just activated by using id of div and calling a function.

Reply to this comment...


I still have some questions on this. Whether it is a good idea to do the heavy lifting of parsing on the user's CPU or it could be well-managed by the server or not. The file will have to be submitted through a form and saved to be displayed in per-user data, that can be done asynchronously as well, so that shouldn't pose as a problem. A problem that could come up, is if the library is not able to handle the file's format ie there are errors in parsing, there will be difficulty in reporting them. But at the server side, something like Sentry can be used to get the logs and fix the errors. So could you please give your points of view on this so that we can find the best possible solution to the problem? Thanks a lot!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Just by way of context, in general, we try to develop lots of sub-libraries in pure JavaScript, like these:

https://github.com/publiclab/leaflet-environmental-layers/ https://github.com/publiclab/Leaflet.BlurredLocation https://github.com/publiclab/inline-markdown-editor https://github.com/publiclab/PublicLab.Editor

This is so that they are more self-contained, maintainable, and have less JS/Ruby mixing. This is a good model and you might look closely at these examples!

Reply to this comment...


There are definitely advantages to server-side programs. But we've found that making lightweight, clearly documented mini-libraries for different purposes can really make for good maintainable code, and after all, we don't really want PublicLab.org's codebase to start growing to encompass all possible uses. We've felt it's better to think of it as the platform and each of these libraries as a kind of "app" running on it. Does that make sense?

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Agreed, I think it makes sense to make the library modular.

Reply to this comment...


Okay so I have made some major changes, especially in the flow of parsing at the front end instead of at the backend, apart from adding some new features. Please review and let me know what else can be done and improved. Thanks a lot @warren @bansal_sidharth2996 @IgorWilbert.

Reply to this comment...


Hi @IshaGupta18 ! This is really amazing proposal , very detailed 🙌 !

You have taken care of all the nitty-gritty to make this project successful . Congratulations 😃 !

Reply to this comment...


Thanks a lot @sagarpreet! I am glad you liked it!

Reply to this comment...


@IshaGupta18 Whoa! So detailed yet exact! Going through this was a treat!! 😃

Congratulations!! 👍

Reply to this comment...


Thanks a lot @rexagod! Happy to hear your kind words!

Reply to this comment...


Login to comment.

Public Lab is open for anyone and will always be free. By signing up you'll join a diverse group of community researchers and tap into a lot of grassroots expertise.

Sign up