Human Verification Investigation / Open Climate Fix MapRoulette Project

Hey y’all,

I’ve identified a couple of OSM tasks to help clean up solar panel data and turned them into MapRoulette challenges contained in a MapRoulette project. (Let me know if this link doesn’t work)

If someone else wants to help out with this I can add you as a Project Manager if you give me your OSM username.

Also feel free to use this thread for suggestions for challenges and feedback on the existing challenges contained here. Most of what I created is trial and error, just investigating MapRoulette as a human verification pipeline, and trying to get an estimate on the speed at which tasks like these get done.

1 Like

Hi Tyler, the link doesn’t work (it’s an admin link). This link works

I had a go. About 50% of them seem doable and 50% not, I guess, which is not bad. It’s quite a big task.

I’m still not quite sure how important the convert-nodes-to-areas issue is. IMHO the add-new-areas-suggested-by-ML is much more important, so I guess we wouldn’t want to over-promote other tasks? Remembering that volunteer effort is finite.

It’s great to try out MapRoulette for this though, and I think this tells us MapRoulette will hopefully be ideal for the add-new-areas-suggested-by-ML process? Do you think so?

1 Like

I’ve just had a play with the challenge (thanks for making it). It does feel quite unintuitive and slow. On one task, I noticed that when I open the area in iD, the panels are already added, but in MapRoulette it’s not easy to show the existing PV tags.

Given that creating areas is much more time intensive for DeepSolar and/or humans, I too think that we would be better off focusing on adding points for new PV not currently there. Residential installs do have a fairly limited range of possible sizes. For solar farms, we can ask for areas and more details tags.

It would be great if we would make some improvements to either MapRoulette or HOTOSM Tasking Manager so that all the user had to do was click on the solar panels they see (with existing ones already shown), and then add any additional tags to it.

Very cool, I’ve just registered and had a go. The walkthrough for OSM was very neat.

I skipped several that were doable, for example a solar array where hundreds of panels where individually marked. Verifying a ML suggested layer could be a lot quicker. But I didn’t mind this task too much; I could see myself doing quite a few.

So that’s just one of the tasks, I’ve made a couple different ones, and yeah it’s pretty big. I agree with your point about not exhausting our finite human verification. For an example of the “add-new-areas-suggested-by-ML” task, I had one challenge visible but I realize I didn’t have a good writeup of instructions for it, so I just now hid it from search. But here’s what it would look like:

There’s a problem though, I’m only able to upload 400 of my 5000 task challenge into MapRoulette, unsure why. My hunch is that if you upload your own geoJSON it dies after a while. I’ve filed a github issue, but the issue queue for MapRoulette is really large, so I’m not sure if we should hold our breath. I could try a reload but MapRoulette’s admin system is very slow and breaks old challenge links if you delete, etc.

I think I might try HOTOSM Tasking Manager to see if the whole system is faster and feels better. Otherwise we might be better off forking something and adapting it to our use case.

But yeah, I agree about the importance of the ML task verification over changing nodes to areas, I mostly made them as an investigation to see how quickly it could be done and the scale of the work. (which is slower and larger than I expected)

RE existing PV tags, for the area challenges, it should automatically filter out areas whenever the task is rebuilt, because it’s just generated from an overpass query. (But rebuilding the challenge takes a while if it’s large, so I haven’t done it yet)

For the human verification of ML challenge, I don’t currently filter by existing panels, so it’s possible the panels it wants you to add are already mapped. Filtering is in the roadmap, but not as important with Austin, as there’s really not that many existing pv panels in OSM. Also it’s a little difficult to do, because an OSM tag may be just a node that doesn’t represent the true area of the panel array, and if the node doesn’t fall within the image boundary but part of the panel does, then it’s not immediately obvious that you should have filtered that task out. (But you shouldn’t filter it out if there’s another unmapped panel on the other side of the image) So for now I’m just err-ing on the side of showing the human too many false positives, even though this might contribute to human turnover.

It doesn’t seem like there’s a publicly available way to create your own challenges for HOTOSM, so I think if we were to do it we might have to host our own instance. (I personally think that climate change is a humanitarian cause, but the people who run it might not see it that way)

When I get a chance I’ll try to participate in some of the existing tasks they host to see if the workflow seems like it’d be a good fit for our verification tasks.

Also, I can make the classification data I’ve collected in Austin available if someone else wants to contribute to the investigation of different verification tools.

1 Like

From this page it looks like maybe you need more privileges to create tasks. Maybe we can investigate the feasibility in our own instance, and then lobby for access to the HOT instance if we want to go forward with it to get a larger pool of users. could possibly be the place for us, if our cause doesn’t fit the humanitarian label well enough. I’ll try to contact their host. (Although there seems to be less users there)

The MapRoulette task is finally polished and all the data for Austin uploaded this time! Here’s the link to it again:

Also, someone suggested zooniverse as a possible outlet for us:

@TylerBusby333 that task looks awesome! It looks easier to follow along with than the last time I had checked in (which was mid to late last week).

I’ll be interested to see how the results look, and in particular how many of the solar panels get labeled with the “label” tag. From my experience crowdsourcing, often times the annotations are more accurate and reliable when chaining multiple simple tasks, rather than asking the annotators to do everything all at once (or giving them the option). This is certainly more work up-front, so if it can be avoided that would be nice. In this case I could imagine breaking the task down as follows:

  1. Verify that there is a solar panel in the provided image (i.e. confirm the ML results).
  2. Verify the location of the solar panel in the provided image. This could be either putting a very loose bounding box around it or a tight bounding box that could be treated as an accurate area. If it were a loose bounding box this could then be used to point annotators to the correct location in a future task (although I don’t think the loose bounding box would be terribly useful, even as an intermediate task).
  3. For those that have areas, ask annotators to add labels to them.

Somethings to think about as we iterate on the design of the task.

Zooniverse looks interesting. Would its main benefit at this point in time be additional visibility of the MapRoulette tasks?

@sallamander hard agree on chaining simple tasks, I’ve been trying to stay on top of the leaderboard just to make sure the tasks seem doable, etc. And drawing areas from nothing can feel cumbersome, but also it kind of feels like duplicating work to create a node and then go back and create an area from it in a different task. It’s hard to know how the users feel without getting a lot of people to test it. I wrote the instructions in such a way that it’s okay to create nodes but areas are better. But I realize ambiguity might increase attrition as well.

But yeah I am in agreement on having each task be very simple and chained together, this is also conducive to the way that MapRoulette tasks can be generated by queries. For example, you can query solar nodes that aren’t areas, and say like “please make this an area” and then you can query areas without a certain label, and say “please add this one label”. I think the important thing is that each task leaves the pv object in a stable state where progress can be directly queried, otherwise it might be difficult to query things like “loose bounding area” if we aren’t allowed to add tags to represent things like imprecision.

Another problem I foresee though, is that the OSM community is very particular about the quality of what’s allowed on the map.
My understanding:

  • Node representing solar panel - okay
  • Area representing solar panel - okay
  • Not a lot of descriptive labels on a solar panel - okay
  • Loose bounding box - probably not ok, they don’t have a way to mark things as imprecise/not finished/not confident and are philosophically opposed to this in my experience
  • Directly imported from ML algo - probably not ok as well
  • Sourced from proprietary data - definitely not ok

This means that for any sort of data import, found by ML, questionable accuracy, or proprietary source, we’ll always have to maintain our own database of pv locations (whether it’s one mega db that all OCF projects contribute to, or specific to each project used to find locations). And then use human verification to port what we’re allowed to from those to OSM. (And then once it’s in OSM the imprecise data in the staging database would get replaced with the human verified version)

I already have to have this architecture for my project SPDW, it’s just a little more primitive than what I suggest here, but I foresee the pattern of: data source -(machine)> staging database -(human)> OSM, recurring a lot throughout the pv mapping projects. It might benefit the organization to create a single database where all sources get staged in, just to standardize the process for human verification from different sources, having a place where algorithm source and imprecision are allowed tags, and a place where proprietary data sources are allowed. Although the end goal of this is to get all the data we can into OSM.

Anyway, sorry, big tangent. I haven’t had a chance to take a deep look at Zooniverse. I think its added benefits might be additional visibility and possibly allowing us to structure our tasks in such a way that they aren’t directly in the context of OSM.

@TylerBusby333 I appreciate the tangent here, I found it very informative :grin:. I wasn’t aware of how easy it is to query for things in OSM (I’m still getting up to speed on the platform), and that alleviates a lot of my concerns. I was imagining that it would require a good amount more infrastructure to parse potentially incomplete results from one task to then kick off a second task, but that doesn’t sound like the case, which is great!

RE the below, I agree. Given that the database you suggest might take in tags produced by ML (i.e. inference results), specifically DeepSolar, maybe it makes sense to start working towards something that brings together the raw data side of things and the inference side of things together?

It might benefit the organization to create a single database where all sources get staged in, just to standardize the process for human verification from different sources, having a place where algorithm source and imprecision are allowed tags, and a place where proprietary data sources are allowed. Although the end goal of this is to get all the data we can into OSM.