That’s a great question, but it is a little bit like asking “How do you cook a good meal?” There are lots of different kinds of good meals, and listing the most important rules to follow depends on a lot of variables (e.g., ingredients on hand, kitchen gear available, who’s coming to dinner). Scientific questions vary a lot more than meals do, so listing the most important rules for designing studies to answer those questions is a challenge. If I wanted to cook a good meal and there was a chef in the house, I might show him the kitchen and what was in the fridge, and tell him who was coming for dinner and then ask his advice. That would be easier on the chef than asking him how to cook.
Experiment or observation
Not all scientific knowledge is acquired by doing experiments. Sometimes simple observation will reveal a pattern no one noticed before. Using an Arduino data logger to record how some variable changes over time probably does not count as an experiment, but it can still tell us something important. For example, if your data logger indicates that there is always more dust in the air during daylight than during nighttime, you might have learned a critical clue about your environment. This new knowledge might suggest an additional observation or an experiment.
In one sense, the above data logging observation could be an experiment because you measured something (dust) under two different treatment effects (day and night). Technically it is not an experiment (see below) and there are several reasons why this is not even a good observation (see below), but it can still advance our knowledge and suggest further observations or experiments.
What makes a good observation?
For the dust observation above to be convincing, you would have had to state in advance that you were looking for differences between day and night. Looking for differences anywhere you can find them after the data are collected is the P-hacking or data-dredging issue. If you collect enough data there will always be some statistically significant pattern present.
To convince yourself that the pattern you found is real and meaningful, you would want to observe it more than once (replication). A data logger can collect data for many days, so one type of replication would be easy—Is the pattern present on all the days or most days (is it present on more days than might be expected by chance)?
To convince yourself that the pattern you found is a general pattern, you would want to observe it under more than one condition. Is the pattern present at only one place or everywhere? Is the pattern present during only one type of weather or all types? Is the pattern present only in summer or during all seasons? Is the pattern present on weekends as well as weekdays?
Each one of the questions above suggests a further observation. When you know exactly what pattern you want to demonstrate, then you can design new types of observations to determine if the pattern is reliably present.
Designing the observation or experiment
Answering each of the questions above about how dust varies (or doesn’t) would require a different type of observation. Some require many replicate data loggers in different places, some require collecting data during two or more types of events. Where and when to place your loggers can be critical. The goal is to plan observations in a way that will make it harder, not easier, to find the result you are looking for. For example, if you think buses are a source of dust, you wouldn’t place your loggers on bus routes close to the road and loggers on non-bus routes farther from the road. Instead of having a goal of convincing the world that your hypothesis about dust sources is true, strive to convince yourself.
The more times you can replicate the observations, the more convincing the result will be. The optimum number of replicate loggers, or logging days, or logging locations will depend on how much the result varies among loggers, among days, or among locations. You probably won’t know how many replicates you need until after you have made the observations, which is a problem. Hey, nobody said science was easy (Okay we all know who says that, but they’re not scientists, are they?).
You will have to select where, when, and how to collect data. Some decisions about these things might be affected by conscious or subconscious bias, so random or systematic selection is safer. This can not only eliminate your personal bias, but also prevent unknown patterns in the environment from introducing bias. For example, if you collect data near bus routes for a week and then on non-bus routes for the second week, a surprise transit strike could alter the conclusion of your study. If you choose days randomly for either bus or non-bus routes, there is less chance of your result being affected by an unknown underlying cause. Sometimes it’s too much trouble to do this, so we accept that our study will not be as robust as it could have been.
Most of the things we are still trying to learn about are complex. Observing that there is more dust in the daytime than at night might be straightforward, but learning why is harder. For example it might be because there is more traffic or more construction during the day. Or it might be because the air is dryer during the day. Or it might be because it is windier during the day. Or it might be because your dust sensor makes higher readings when the sun is shining on it. It might also be due to something you don’t know about, but think hard about the things you do know about and make decisions that take those things into account. Sometimes you can’t disentangle the correlated variables without experiments designed to control each one.
When is it an experiment?
Technically, experimental studies differ from observational studies because an experiment involves manipulating variables. If you block traffic on a road every other hour during the day and measure dust levels continually through the day, that is an experiment. If you fill a pickup truck with ashes and drive past half of your sensors every 30 minutes, that is an experiment.
If you measure dust, temperature, humidity, and wind every 10 minutes for a year, that is not an experiment—it’s an observation. If you do that with 100 replicate sensors randomly located throughout an area, it’s still an observational study even though you might be able to demonstrate a meaningful and statistically significant correlation among dust, temperature, humidity, or wind. If you analyze your data and find that there is more dust on weekdays than on weekends, that does not make it an experiment.
If you put half of your sensors next to two-lane highways and half in leafy neighborhoods, it is still not an experiment because you have not manipulated anything. If you put half of your dust sensors 20 feet above the ground, and the other half one foot above the ground, you have not manipulated anything except for the way you are observing something. However, if your study is about how sensors work in different environments, then the previous two examples would be manipulating something relevant to your hypothesis and that would be an experiment. So context is relevant and there is some wiggle room associated with classifying a study as an experiment.
Experimental design is a discipline that most science students get introduced to. It is closely related to statistics because the goal is to design experiments which have the power to answer the question being asked by performing a statistical analysis of the final data. Observational studies are part of this discipline. Because observational studies give the researcher less control over the variables that might affect the result, they have less power to answer questions. But they can still provide answers.