Help save net neutrality! A free, open internet is once again at stake—and we need your help. Learn more »

Public Lab Research note

  • 8

Intended Purposes for Different Tools and Techniques

by gretchengehrke with donblair , Shannon , warren |

gretchengehrke was awarded the Basic Barnstar by warren for their work in this research note.

Technique Strata

(Note: have a better name for this? Please suggest it!)

Environmental data can be used for a variety of purposes, and those purposes have different requirements for the data quality (e.g. accuracy and precision) of a study. Tools and techniques can have different maximal performances, making them useful for studies with different types of objectives. For example, a qualitative technique should not be used in a study investigating whether or not an area is in compliance with specific air quality standards, but it could be useful in monitoring for a marked aberration above a generally consistent background concentration.

We want to make sure it is clear what the peak performance of each tool (and primary technique) is intended to be in order to sort out in what kinds of studies they will be useful. Thus, we are conceptualizing a "Technique Strata." We have a lot of questions, especially in regards to determining the best use-case scenarios, who labels tools (or techniques) at different strata, and communicating intended uses and tool strata.

To kick off this conversation, we have put together a table (found at the bottom of this research note) of different types of research agenda and an estimation of their requisite data quality. Our table is similar to that suggested by a team of EPA researchers, found on pages 33-40 of the EPA 2014 Air Sensor Guidebook (, but is more tailored to the kinds of projects that are more common in Public Lab. For example, we don’t currently have environmental monitoring projects whose expressed purpose is hot spot identification, but we do have projects that are building an alert system for when a threshold value (for particulate matter, conductivity, turbidity, etc) has been reached. This is a generalized table, and will expand with new kinds of explorations, such as potential new research in personal exposure monitoring.

Important Questions

Here are some big questions for discussion:

  1. Who designates tool levels or strata?
  2. How do we communicate when different techniques, using the same tool, have different levels of possible data quality?
  3. How rigorous do the pre-label tests need to be in order to figure out the appropriate stratus for a tool or technique?
  4. Where should the open community critique and discussion about tool/technique claims live?
  5. Should there be a sort of “certification committee” for different tools?
  6. Do we need to develop trust inside or outside of Public Lab for these tool and technique designations?
  7. Should there be a “Discussion on status” link or warning on tool wiki pages and in the online store, and a disclaimer about the limits of a tool’s potential data quality?
  8. Should there be a direct label of “Intended for X” on each tool?

Please comment with your thoughts.

Stratus Estimated Precision * Sample Size Remarks
Contextual indicator +/- 50% small This is mostly to ascertain presence/absence or a general idea of the parameter in question, but is not quantitative, and is mostly used for educational purposes.
Threshold Alert +/- 30% sufficient to observe background and aberration from background This technique should include a calibration, but perhaps not with every measurement, and may not include sample replicates etc.
Quantitative Indicator +/- 15-20% sufficient to observe patterns and aberrations This technique includes quantitative calibration, assessment of accuracy and precision, includes data quality assurance and quality control (QA/QC) parameters including sample replicates and blanks, etc
Regulatory Recognition dependent upon the federal reference method (FRM), usually +/- 10% dependent upon the federal reference method (FRM), usually large This technique adheres to federal standards, or produces data quality sufficient to be officially recognized by certain regulating authorities. This stratus has the strictest data quality objectives.
  • Precision means the replicability of a measurement. For studies that are trying to document when an abnormal event occurs, the exactitude of the measurement is less important than demonstrating that it is different from the normal condition. For studies that are trying to demonstrate smaller variations in values, such as in a geographic canvassing, measurements may need to be more precise in order to observe differences in true values.

evidence standards data evaluation certification

barnstar:basic with:shannon with:warren with:donblair


Great post. I have lots of thoughts on this, some of which we've discussed already, but I first wanted to say that in an open source, community science model, I think the idea of a “certification committee” is potentially a very separate conversation. Definitely we need to think about how tools can be evaluated both easily by new users and rigorously by folks who want to closely and independently test claims. But I think the idea that there might be a centralized body as the main mechanism for such evaluations is at odds with our desire to democratize and decentralize expertise and create accessibility in knowledge production. If some contributors choose to set themselves up as "trusted reviewers" and build -- on the basis of transparency, thoroughness and excellent, thorough documentation -- a reputation for themselves, that's great. But I wouldn't necessarily trust a review committee whose only claim to rigor was based on formal accreditation (like having a PhD).

I guess I'm more interested in scaffolding a set of transparent and legible norms for making claims about a tools fitness to answer a particular question, and for testing such claims and sharing such evaluations. So I'm very excited about this ongoing discussion and how it might bring -- at the very least -- a clear and accountable labeling standard to PL techniques!

@warren, I definitely agree that the idea of a centralized committee would be at odds with democratization of science. I was envisioning something more along the lines of a team of significant contributors carrying out specific tests, sort of like the beta testing program fro the Oil Testing Kit, not a hierarchical body of self-proclaimed experts.

Yes, yes, yes to developing a scaffolding for transparent norms in assessing the abilities of a tool and technique for various purposes. I think it will have to be an iterative process, first with testing out the tool itself in controlled circumstances, and then various field scenarios. I'm also interested in sussing out how to communicate the capabilities and limitations of techniques and tools independently from each other (as independent as possible).

Thanks for bringing this together @gretchengehrke and @shannon and many others. This is going to really help the "Tool Page Standardization" efforts. Excited to see this line of thought develop.

The topic is kind of overwhelming when you think of all the possibilities but maybe one model would be to start compiling how different METHODS were used successfully and which instruments/tools were utilized.

There could be various qualitative scales for accuracy, portability, response time, and even difficulty level. I think a method/application based score is more appropriate since instruments/tools can be used in multiple ways. For example, the formaldehyde test kit is a compilation of components and the accuracy will vary depending on your tube manufacturer, flow meter, and multitude of conditions, like sample flow and volume. If it were in a database format (or even wiki linked) a user could also cross reference and see all the methods in which any one tool is utilized.

Some elements to consider reporting:

  • Method Name
  • Equipment used (multiple pick lists for cross reference)
  • Media (pick list for cross reference: air, water, soil, etc.)
  • Accuracy: Scale of 1 (ball park figure) to 10 (best known method)
  • Response time: Scale of 1 (lab analysis required) to 10 (<1 sec frequency)
  • Level of difficulty: Scale of 1 (turn on and use) to 10 (elaborate setup)
  • Portability: Scale of 1 (only with a forklift) to 10 (carry in your pocket)
  • Project citations: e.g. published paper, project, or "lab test by [name]"
  • And then you could be more specific in a narrative format, like "Achieved 5% of range accuracy..."

As for a process to create, a "product review" format might be appropriate to build content quickest and then an average of multiple opinions are initially reported but a user could still read individual opinions.

Newer instruments would fit into this model too. Like if there was a new PM monitoring device just developed and there was no field experiences yet, then someone could do a lab test and use those results to assess the instrument.

I like this idea a lot, @DavidMack. One thing we've been talking about on staff is developing a sort of environmental monitoring guide, one section of which would include a table to help choose a method based on all the things you listed and things like support network and tool performance longevity. In addition to having the 1-10 scale, including a table with real data (e.g. accuracy and precision based on a standard reference material (if applicable) and n>10 independent measurements, or something like that) would be useful too.

I wonder if the method (and I agree, method makes more sense than tool) categorization should be separate from this sort of guide though. While I think the ease of use is extremely important to communicate, it is a separate issue from the data quality potential for a given method. That said, ease of use and long-term in-field performance would strongly affect the kind of sampling campaign that would be possible, which would impact the kind of study it could be used to conduct, so maybe they do all belong in one document. Food for thought.

BASIC INFORMATION Project Name Project Objective (qualitative, quantitative, both) Media to be measured (Air, Water, Land, combo, people’s opinions, etc.)

Data GRANULARITY Units of measurement Frequency of Measurement Depth of Field measured (in 3 dimensional space)

KAPPA DELTA Describe any changes upon test-retest or tester1-tester2 deployment Season deployed Social Context of deployment

EVALUATE Subjective report of each tool/technique user via Likert scales Ease of use in field 1-3 Sensitivity 1-5 Specificity 1-10 Validity to purpose 0 1 2

You must be logged in to comment.