Format for OpenStreetMap submissions

What format and tags do we think we should include when making submissions to OSM?
Existing guidelines:

I think the most important thing for our purposes is that each solar panel is an area instead of a node (unless we have accurate power output figures for the node). This way we can estimate the power output by using some watt per area constant (or maybe using some sort of non-linear function approximator :slight_smile:, as larger solar panel areas might represent utility scale panels, which may imply different average efficiencies, spacings, etc.). Additional information that would be useful on top of the area would be inclination angle and orientation angle as described by this graphic:

I’ve been told that the tag: “generator:orientation=227” is currently used for orientation, but I’ve never seen it in use, so that might end up being a larger tagging push, or something we can build into our own verification app. I don’t know if there currently exists a tilt tag that we could use, some research would have to be done here. Estimating tilt would be a little harder to do via satellite imagery than it would be to do orientation, but the location context could help estimate a little bit. e.g. utility scale is probably either flat (pretty obvious from satellite), or tilted optimally. Residential is usually on roofs that are around 45 degrees.

My only concern with this tagging scheme is that there’s no confidence or way to tag something in such a manner to show that you aren’t 100% sure (To my knowledge). That would be helpful when building datasets or deciding whether or not to estimate something in ML.

See also the extra drafting we’ve already been doing on the OSM side - the following page is intended as a tagging guide for UK people manually mapping things:

So the tags recommended in that wiki page would tell you:
roof vs ground mounted;
number of modules;
start date (full date, or just the year, if it’s knowable);
power output.

All tags are optional in the world of OSM. (If you want to know what tags are in active use, the website analyses the OSM database.) Do remember that for manual volunteer mapping, there’s a complexity tradeoff - if we can make the job simple for people, it’s more likely to happen. That’s our consideration for the above list (“orientation” may even be quite tricky for people to label). This is why I suggested tagging the number of modules, easier for people to count than to estimate surface area in sq m!

You’re talking here about automated detection rather than manual. The demands are a bit different in that case, but we can be broadly compatible.

Indeed there’s no good way in OSM to record a level of certainty. The data in OSM is supposed to be high-quality and could be manually improved even after it’s been entered, so it’s not really part of the data model. You could include it in some kind of ad-hoc tag, but I wouldn’t recommend relying on this.

You would also want to tag the source of the detection, for audit. This would specify which imagery was used as well as some indication of the algorithm. e.g.

I think tagging as an area instead of as a node is great if you can do it. IMHO, don’t be sad if you can’t. There may be a tradeoff: for the smallest PV installations, it may be very hard to get a good guess of the shape. A badly-guessed shape should not be imported into OSM (and might be rejected), but an honest point should.

1 Like

P.S. it’s useful to see the wattage assumptions Russ used to plot his OpenInfraMap solar heatmap:

The “generator:output” tag is used if present. Otherwise, if generator is an area, output is calculated at 150 W/m^2. Generators mapped as points are considered to output 1 kW.

1 Like

So for the certainty portion of what I said, I was more referring to a user guessing the orientation/tilt from a satellite image, than the algorithm guessing that. The current human verification scheme we were thinking of was giving a human a bounding box where we’re fairly confident that there’s a solar panel, and then the human maps out the area or plops a solar node on the location.

However, it might be possible to generate precise solar panel positions for import like you mention (that can either be quickly pass/failed by a human) and can be further verified and expanded into areas and tagged with orientation/tilt/panels by a human later. I was just under the impression that OSM in general was opposed to data imports, so the verification that I envisioned had the human creating all the OSM data from hints we gave.

OK - the “certainty” comment applies to human as much as automatic: there’s no particularly well-structured way to store it.

I agree entirely that the best way is most likely to be finding locations for people to map manually. Yes, OSM is opposed to bulk imports. Manually-validated-and-imported imports can be OK, if handled delicately. But I forsee the same approach as you described (“giving a bounding box…”)

1 Like

Not sure if it’s too late to change, but the preferred convention for recording orientation would be clockwise from due North (i.e. compass bearing). The convention of measuring from due South is common in certain models (e.g. transposition of GHI into plane of array) because it simplifies the trig equations, but it gets confusing in the Southern hemisphere. If we’re aiming for global coverage eventually, then compass bearing would be preferable.

Likewise, it would be good to clear up the nomenclature used when labelling the nominal capacity of the systems. In the wiki link @danstowell shared ( they use terms like “power output” and “power capacity”, but it’s not clear what this means. Preferably systems should be labelled with the “Nominal AC capacity” and/or “Nominal DC capacity”, both of which would usually be given in units of kWp (kilo-Watt-peak). This equivalent dimension to kW, but the “-peak” makes clear that the value refers to a “Nominal capacity” i.e. measured under STC, not a real power output.

I think it’s key to get these standards in place asap to avoid wasted effort labelling systems and uncertainty later when utilising the OSM data…

It’s already clockwise from due North

Ahhh, I was basing that comment on @TylerBusby333 's diagram. Would be good to clear up the capacity labelling though…

For capacity estimations, i’d say it would be counter-productive to follow the logic used by openinframap - this could lead to huge errors and really limit the usefulness.

We can guesstimate the nominal DC capacity of a system by counting the panels and using data from Microgen Database and/or Photon database and/or MCS panel database to make probabilistic assumptions about the nominal capacity of each panel. The most probable panel nominal capacity will be different (smaller) for utility scale solar compared with domestic, and the installation date would have an effect. I think it would be possible to produce a collection of probability distributions of panel nominal capacity for different types of system (small, medium large, old, new, rural, urban) from the MgDB/PhotonDB and use those to make informed assumptions about the likely system capacity. Obv this would need to be done in post-processing once the location and N panels had been tagged.

Apologies about the mislabeled diagram, I just picked the first picture I saw that used both orientation angle and tilt angle, as I think both are important in estimating the solar output at a given time and sun position. I agree orientation can start at north.

I’m not suggesting we have to exactly follow the methodology used by OpenInfraMap to get solar panel output estimates, but to me it doesn’t seem obvious that the number of panels approach would give a better estimate for power than area of a contiguous array, if other factors were taken into account and the area->power approximation was trained on lots of training data. Especially for larger arrays, I think it’s much easier for someone to just map this area rather than count all the panels. (To be clear, this is a commercial array that has no public data available about its output)

Also, it’s not immediately clear if there’s multiple non-contiguous arrays on one house if the node corresponds to one of the arrays or all of them, but if you have separate areas it’s immediately clear.

Lastly, I’d argue that the node style of tagging makes it more difficult for machines to reason about the solar panels. For instance, if I have an image that only contains the top two solar arrays on the left of the image above, but there’s no OSM solar node in them, I don’t know that they’re already accounted for. But if they’re accurately mapped as areas, then I can filter them out and not present that image tile to a human user.

I do think that number of panels on a node could be good as a starting point, and maybe works better for certain configurations of arrays and level of satellite imagery, but I think we also need to take into account these cases I’ve outlined. Also, to me, counting number of panels is more intensive, and less instantly visually verifiable than just outlining most of the arrays I see in the US.

I take your point, i’m not convinced either approach to estimating the nominal capacity of the systems will be sufficiently accurate for the capacity estimates to be useful in my applications (modelling nationally- and regionally-aggregated PV outturn for the GB TSO).

I certainly wouldn’t expect people flagging systems to manually count panels for large systems, these we could count with an ML algorithm. An undergrad student of ours is presenting some work at the PVSAT15 conference next week whereby she used a very simple Gaussian Blob detection approach to count PV panels in satellite images of PV systems with reasonable accuracy.

The work flow could be something like:

  1. We identify probable PV system locations through a combination of ML and static datasets from OfGEM/BEIS
  2. OSM community review the locations and trace the boundary of each contiguous array.
  3. Second ML algorithm processes the satellite imagery in the traced boundaries to count modules and add label
  4. We apply a capacity label by assuming most probable panel nominal capacity and multiplying by N panels

The issue with using area as a proxy for capacity is that modules use different cell sizes and different numbers of cells, and have very different efficiencies, so the choice of a constant Wp/m^2 conversion could easily lead to bias, whereas a statistically driven approach would hopefully be bias free, and reasonably accurate if the capacity is aggregated regionally or nationally. That said, there is definitely some information in the array area if we also know the num panels - we could guesstimate whether they are 60 or 72 cell modules and use that to narrow down the most probable panel capacity.

1 Like

Definitely agree that systems should ideally be mapped as areas not nodes, so that there is no confusion wrt non-contiguous arrays or close by systems. Also more likely to spot systems that have been removed.

1 Like

Found some more corner cases, what should best practice be for arrays like this? Mapping as a multipolygon with holes seems unreasonable, but if an area is mapped around the outline of each, it seems like we would overestimate the output due to the irregularity and holes.

These corner cases are good illustrations. Likewise, in aerials of UK solar farms I see lots of different strategies for laying out panels, with spaces between that often would be a real pain to draw multipolygons around.

I would say we map an outline of each, don’t worry about the holes. (This corresponds to current advice from others to UK mappers.) (“Micro-mapping” would be a barrier to scaling the project up.) Would I be safe to hope that we could use a simple colour-based pixel classifier to estimate how much of a mapped area is actual PV surface area? I’d hope it could be as simple as an adaptive threshold.

You’re correct, imagery could be queried for the osm area and then image segmentation could be run over it to at least create an estimate of panel area. Here’s an example of segmenting one image tile:

And it wouldn’t be too hard to do segmentation over a whole “cluster” of tiles to get a total panel area figure.

I would imagine this estimated area isn’t really human verifiable, and OSM would generally be opposed to having a tag representing this. Maybe we need sort of like a meta OSM that stores extra information about OSM objects.

I agree with @danstowell that drawing around each sub-array will limit the scalability, so it’s probably better for humans to trace the outter boundary of the arrays and handle the holes separately. We have some lightweight code here at TUOS that counts panels once the PV system has been located in an image, and I don’t think it would be much of a stretch to tweak it to also estimate the area covered.
I’m not familiar enough with OSM to know whether this kind of post-processing on a large scale is achievable though? Or maybe this type of processing belongs as a layer outside the core OSM? I think @jack previously mentioned the idea of having different “views” of the OSM data that have additional post-processing and/or reformatting to make them more useful for a particular application.
The point about being able to distinguish between tags that have been derived in this way and tags that are more robustly determined is a good one - I think there will be some examples where a keen OSM contributor will accurately determine the area and/or manually count the panels, or they might get this data by searching for the system in one of the public datasets (FIT, RO, REPD etc). Again, don’t know enough about OSM to know how this should be handled.

Seems like it would be a good idea to start drafting a refined and more complete set of best practice instructions for the OSM wiki?

It’d be fine to have the auto-panel-counting always sit outside OSM. Importing the data directly into OSM would need more negotiation, and I don’t think it’s necessarily needed. Assume as the base case that we count/measure panels as a postprocessing external to OSM, and then later we can consider whether the data are good enough to propose actually putting the panel-counts into OSM itself (the tagging would be fairly straightforward IMHO).

Since OSM is crowd-curated, it’s standard practice for any application to have a pre-processing extraction step to convert tagging into a fixed form, with a set of mappings (such as the ones Russ uses to guess MWp, mentioned above), remove outliers etc.

Building up the best-practice guidance: yes sure. Two things to bear in mind:
(1) we don’t quite have the final say within this group, it’s more of a negotiation with the active OSM groups. For example, the lack of clarity about “MW” vs “MWp” etc is a bit tricky, because the approved OSM standard for power tagging exists, but doesn’t actually clarify that point. So it’d take a bit of discussion with whoever else has an active interest in mapping of power infra.
(2) the simpler the specification, the more volunteer help we get!

There are few more variables of output capacity, like, country in which solar panels are installed and the seasonal pattern in that particular country, if we are planning to apply the same logic for mapping Solar panels across the world where people from other countries are also participating. This would help in guesstimating the output capacity to the nearest value.