Public Lab Wiki documentation

Water Quality Primer with Jeff Walker

« Back to Water quality

On Monday, March 5th, Jeff Walker, PhD newly acquired from Tufts University, presented a Water Quality Primer to some of the Open Water team at MIT. This is a record of that talk and you may be able to watch the YouTube video here.


Jeff gives us some information about his background. He did an undergrad at Cornell in Science of Earth Systems and focused on hydrology, but then decided to be an engineer and came to MIT. He has done work on Coral in St John's, worked for CDM which is a company in Cambridge. He then went to get his PhD at Tufts and ended up doing web applications for modeling. He was frustrated with the modeling tools in the field which are based in DOS and Fortran. Since Open Water is going to be measuring water quality, he wanted to show us how he thinks about it, how he visualizes water quality data. He will show us examples of water quality projects he has worked on.


Where did the field come from? Dr. Steve Chapra - Rubbish, Stink and Death - gave a lecture with this title. There are three core problems to water quality. Rubbish refers to the aesthetics like trash, algae blooms. Stink refers to ecosystem degradation, they die and become stinky. Death refers to the public health aspect. Dates back to the Thames and pollution of the river. "Stink event" in the 1860s gave birth to the current field of water quality.


The Science of Water Quality

  1. Physics
  2. Chemistry
  3. Biology

You have to think about all three of these things when you are thinking about water quality. How is the water moving and mixing (physics), how are the compounds reacting (chemistry), how do the living organisms interact (biology)? Also fish can control water quality by eating phytoplankton. That's originally why Jeff found it exciting - it's complicated.


The other way to think about it is: Fate and Transport. Fate: What's the fate of the pollutant? and Transport: Where did it go?


He shows a diagram of several chemical compounds - a "Nutrient Algae Growth Model". There are many pathways for a particular substance to take. O's in the diagram stand for Organic. Dead things become nitrogen and phosphorous which then have other pathways through the system. There are a lot of moving pieces. Run off comes in as "External Loading" in this diagram.


When is Water Quality a Problem?

This depends. You can't just call something a pollutant without specifying who, how much and when. We think of Phosphorous as a pollutant but we are made of phosphorous. it's considered a pollutant because it drives algae growth with drives low dissolved oxygen. Cyanobacteria is a big problem around here. He shows a slide of the Charles River with too much blue-green algae in the water. This is due to too much phosphorous. This algae produces neurotoxins (problematic for swimmers, dogs, etc.).


It's a challenge to figure out how to avoid the neurotoxins - how much phosphorous can we let into the river? That's a difficult question. Similarly low levels of metals are ok. You have to put everything in the context of human health, in the context of health of ecosystems.


There are a lot of different types of pollutants. One of the most basic ones is the difference between conservative and non-conservative pollutant. Conservative = something like salt which doesn't react with other things. It doesn't settle out or get degraded. It just travels with the water. Another word for that is "tracer" -- we can use it to see where the water came from. A lot of other pollutants are non-conservative. If you measure phosphorous, for example, you are measuring what came from the land and then went through all those processes on the slide (eg degrades and changes form in the river). The majority of pollutants are non-conservative. The other distinction is particulate versus dissolved. [EXPLAIN]


There are all kinds of pollutants. There are different sources - "Point Source" is anything that comes out of a pipe, like wastewater treatment plants. Stormwater runoff is not exacttly "point source" even though it often comes from a pipe. "Non-point source" is basically runoff and is more pervasive in the environment. It's easier to measure what comes from a point. It's hard to know where the non-point source stuff comes from or how to measure it.


Understanding Water Quality Data


Jeff will show us some examples of his projects that have used visualizations to understand water quality data. The vast majority of them - the fundamental question is about variability. Why do the concentrations change? In space and over time? Understanding the variability of water quality concentrations in data.


Time Scales are very important - there are changes in water at all these scales. Long-term (annual) Seasonal (monthly) Events (Weekly) - Stormwater runoff Diurnal (Daily) -


The Everglades

He shows the slide about the Everglades. Non-point source pollution comes off houses and farms and goes into canals. They have built massive reservoir and treatment places. Polution goes to reservoirs. Then into constructed wetlands. The idea with that is to use plants to filter out phosphorous. The wetlands are trying to buffer the phosphorous from entering the Everglades park. It goes from Everglades into Florida Bay. They spent billions of $$ since the 1980's to create this system. The tour was a constructed, animated model.


Now Jeff shows us a map of the sites. The red points are the inflow stations. The water flows south into a marsh, then into canals & reservoirs and then into the Everglades Park. They have a massive data set that they use for all the sites.

Mark asks - How unique is that project because of the built infrastructure?

Jeff - Yes, I'll show you the consequences of that in a second. They have totally changed the hydrology. Which also changed the water quality.

[Slide15.jpg] (

Jeff shows the slide with many phosphorous measurement. These all use log scales. They have a few values that are very high and some that are very low. The questions here are:

  1. What are the spatial gradients?
  2. What are the long-term gradients?


Here we would order these in order of gradients. He shows a box plot of all data over time. Stations are on the x axis running from North to South. You can see how phosphorous declines over the North-South process. The box plot shows a spike in the Miami Canal. That has to do with because of these canals. "Geo mean" = geometric mean, median value - these are a sort of average of the distributions. Then we plot those on a map (his next slide) and correlates color and size of symbol with amount of phosphorous present in the water. You can clearly see on the map where the values spike in the canals. It turns out a lot of the polluted water is flowing around the marsh and then into the canals, so not flowing strictly North to South.

Slide17.jpg Slide18.jpg

One of the complicated things about water data visualization is that you normally want to see it both spatially and over time. But that can be difficult. The next graph is for a single inflow station. The top two graphs show a monthly time scale and then an annual time scale. "Seasonal Trend Test" - take the data and look at it by month (middle graph on the slide). Then you do a test to see if there's a significnt decrease each month. March line slope is not as steep as June. If you plot that (graph on bottom) then you see whether it's decreasing trend or not. So what the graph shows is that the decreasing happens in the wet weather and that it mostly happens from June to December. This is a trend analysis.


The next graph shows the trend analysis plotted spatially. This shows decreasing trends in the in-flow. They had really wanted to see this result -- to see if their work to reduce the runoff was effective. This is a way of looking at the spatial gradients of temporal variability.


Onandaga Lake

This is in Syracuse, NY. It was once the most polluted lake in the country. They have a similar question - is the water quality getting better or worse over time?




We are now in the Onandaga Lake where they want to measure bacteria. The graphs show once you reach a threshold of precipitation then you see that bacteria multiply. Here he split the graphs into multiple seasons. For example, water quality in the NE is only a problem May - October because that's the recreational season. But they saw pretty similar patterns across the two seasons. The red line here is the state standard - they want to get below the red line. But they have a big challenge because they have concentrations of bacteria even during dry weather. This could be from birds, animals, septic tanks and so on.

This lake also has phosphorous problems, there's a superfund on the shore, there used to be an industrial site on the shore. Originally the phosphorous here was from the wastewater treatment plant on the south shore. Phosphorous can come from air, street, and fertilizer runoff. Combined sewer overflows - when you have one pipe that takes the sanitary sewer plus the storm water - this is a big problem. Those pipes are only so big so when it rains a lot there are two options - back up or release valve. These normally go into the river instead of backing up. It happens once a month in the Boston area. This is really difficult to fix - you have to dig up all the streets.


Now he shows the slide with four separate graphs. They did the same thing in this analysis where they break it up into seasons. Here it showed that bacteria were actually increasing in dry weather. The graph at the bottom shows this by mapping the trend slopes over time.



Mystic River: Stormwater Events


This is for the "event scale" of data - i.e. weekly scale of data collection. This was collected using an autosampler which looks like R2D2 and lets you take regular set of samples over the course of an event. You stake a tube in the middle of the river and then it pumps water over to R2D2.


He shows the blue graph. The blue line is the flow which is from the USGS data. This is called a hydrograph showing the rate of flow over time. The orange points are phosphorous. This illustrates "the first flush". You would expect to get highest concentrations of phosphorous at height of storm. But that's not true - right before the beginning of the event you get a spike.


Gap in data because the battery died. Then he shows the turbidity graph. It's not quite as peaky but it increases about the same amount. The reason they are doing turbidity is that you can use it as a surrogate. If you correlate phosphorous to turbidity then you can use that relationship to estimate phosphorus. This is where the Riffle could be really helpful.


Next graph shows relationship between phosphorus and turbidity. This would be different depending on different water sources so you would have to test in different sites. First you would have to measure phosphorous at the site and then you would know the relationship between events, phosphorus and turbidity. If we develop a model that predicts those relationships per site, then we could take many fewer phosphorous samples and still have a good indicator of its quantity. Measuring phosphorus and turbidity with lab analysis costs $100 per measurement, so it would be great if you could do it for cheaper.


If you do enough good samples, explain the variability between the stations, then you could do lower cost sampling and use that to explain the variability with more spatial resolution or in a more continuous way. People who want to go kayaking or swimming could just pull up their app to see in real-time what the health of the water is.

This could be done with phosphorus and turbidity but Jeff and Elizabeth say that bacteria is much harder to predict.


Ecosystem Metabolism

This is work from Jeff's undergrad work. There was a wastewater treatment plant here. They decided to spray their waste on trees. The phosphorus bound to the soil particles but the nitrogen could not and then the waste seeped through groundwater and they had a massive plume of nitrogen in the groundwater. This is very hard to clean. It started to seep out into a harbor. They wanted to see how the nitrogen affected the ecosystem metabolism - do you get a lot more algae blooming? more death?

They did a dye study where they used rhodamine to measure the plume.


He shows the slide measuring dissolved oxygen. You can see measuring over a week that DO increases during the day and decreases at night. During the day plants are growing, they make oxygen. Then the sun sets but things are still living and respiring. You get the up and down cycle every day. You can convert this to a measure of the productivity of the ecosystem - how much plant matter is produced each day? How much respiration is there?


He put a probe out there for a year to get the max and min for each day which is what the next slide shows - "Changes in carbon". Each point on this graph is one month - how much carbon was produced and respired during those months? It's more in summer because the temp is high and then it drops off in winter because of temperature. All water is temperature dependent.

Then they did this for four more years. As the nitrogen came out of the water, you got more respiration. This was a way of using a continuous sensor.


Conductivity and Temperature

He sees two benefits: 1) Tracer: Conductivity is a direct measurement of # of ions. Not just sodium and chloride. It could be nitrite and calcium. If you have water from a parking lot vs water from the ground they have different signatures and this gives you a way to see how the signatures combine to reflect mixing of water. This means you can use them as a tracer.

2) Density: With these two things we can compute the density of water and he's going to tell us why that's interesting.


Jeff shows several graphs of temperature data from the Mystic. You don't see patterns here. But if he plots it as a function of the day of the year - i.e. the x axis is 0-365 - then you see really clear patterns. You see the patterns really clearly. This is almost always a really useful way to look at water quality data.


He shows a graph of Patrick's data of conductivity. You see a spike in 2010. Jeff then took all that data and did box plots by month and you see that spikes happen in Jan/Feb because they salt the roads in those months.

Don asks: Is precipitation enough? Like do you need depth and flow?

Jeff: You don't often have that data. We are trying to get USGS to put more flow gauges out there. The flow data reflects if there was a storm. All airports have rain gauges. Other places like Hubbard Brook have their own.

Mary: We have them close in the experimental forest.

Jeff: Some people use radar but that only shows reflectivity. If we could more effectively measure precipitation then that would REALLY change things. The City of Cambridge has their own rain gauges. There's a consortium of hydrologists that are trying to do Hydrologic Informatics. They run their own servers. They end up with tons of small projects.

Mark: One of the more useful products I've found is from NOAA.

Mary: It's forecast models - 5km resolution - 5km grids. This is based on whorf model. They forecast 48 hours in advance and then they revise it. We harvest it for all North America but they don't save it. We are now saving it. And that's all public access.




Why is Water Density Useful?

Jeff shows a graph that shows the stratification of the lake via temperature. The graphs show temperature over time based on depth (top left). Combined with salinity (top right). You can make density calculation (bottom left). When you get to June the density of the water is lower at the bottom of the lake. Then there is no mixing between top and bottom which means that no O2 can get from top to bottom of water. The lake is stratified and you get low O2 in bottom water. The nice warm water sits on top of lake and the two don't mix. Then you get to October and there's a day called "turnover" - when the bottom and top reach same temperature then the lake water flips over which can create a fishkill.

This is why Jeff is really excited about measuring density with the Riffle. There is salty water that comes through the dam near MoS in Boston, so they are interested in measuring whether there is a salt wedge. You can relate density and conductivity vis statistical calculations that are in the literature.

Density is useful for estuaries as well to see how much mixing there is between freshwater and ocean water. Doing it at the mouth of an estuary would be really interesting.

This gives you your physics part of it - how does the water mix? It's hard to measure flows and mixes. It's very helpful to understand how the water moves in these systems.



How is density measured? temperature and conductivity?

Jeff says -- it's like the Riffle - basically a commercial sensor probe at different depths. On the slide "Ecosystem Metabolism" it shows a fancy probe that sinks into the water. This probe also has an ambient light probe - that would be a useful thing to include in the riffle. For example, more light underwater = more growth so you want that.

Don asks: If you are measuring light over time could that be a proxy for depth? Jeff: Not really because it gets complicated for cloud cover. If you have one at the surface and one at depth. With light you are also doing "light extinction" which gives an indication of turbidity and algae. Depending on depth of algae it gets different amounts of light. If surface algae cuts out light then nothing can grow below it.

Don describes a Secchi Disc - black and white patterned disc - you drop it into the water and then when you can't see it you mark that. Very cost-effective. Some organizations do that every day. Someone from Whites Pond in Concord, MA, measures that everyday.

Don: Maybe that's a comparable model for how to use low-tech tools to get high quality data sets.

Jeff: Yes, but to get a good data set you need consistency across it. If it's inconsistent it's very hard to get insight into that. Between sites is important and same rate of sampling is important.

Mark: What Jeff point out about the first flush is neat to see and you can see that in the conductivity data. One thing we might want to try to do with our data is to look for those signatures. Could use conductivity as indicator of “first flush” (not necessarily reflective of phosphorus, but is reflective of ions).

Hi all - if there are folks in the doc - we are moving into the interactive part of the event so we won't be documenting it as well. Thanks for reading…… We will publish this as a wiki shortly.

Questions: -When is first flush?

Audiences: -Citizens -""Scientists"" -Watershed Managers -Agencies / EPA -Educators -Fishing communities -Journalists & students

Issues: -First flush -Salt runoff -CSO -Thermal Plume -Bacteria -Algae blooms -Stratification / fish kill -Ag runoff

Places: -Tanzania -Colombia -MA -NH -Tidmarsh -Mystic -Merrimack River -Lobster fishing community -Hurricane Island -Lake Champlagne













Public Lab is open for anyone and will always be free. By signing up you'll join a diverse group of community researchers and tap into a lot of grassroots expertise.

Sign up