We’re talking with a company that owns lots of UK solar PV power data (some is 1-minute resolution, some is half-hourly). I’m really excited because they’re being very generous and they are considering publishing this PV power data; but only if we anonymise the data first. That is, we need to modify the data so that it’s not possible to identify which exact house each data stream comes from.

We’ve considered two approaches to this - neither approach is perfect so we’d love your thoughts:

**Reduce the precision of the latitude and longitude**

For example, we’ll remove the last few decimal places of the latitude and longitude, such that the resulting lat/lng is accurate to about 1 km. This is our preferred approach.

**Spatially aggregate the data into 1km x 1km grid boxes**

First, we’d convert the raw data from kW to ‘yield’. That is, we’ll convert to the *proportion* of maximum power output. Then, we’d divide the country up into 1km x 1km grid boxes. For each grid box, we’ll report the mean yield of all the PV systems within that grid box. A problem with this solution is that different PV systems have different characteristics (shading, tilt, angle, inverter power caps, etc.) So it might not be appropriate to just take the mean.

A problem with both approaches is that, in remote areas, some 1km x 1km areas might only contain a single house with PV.

A work-around for the spatial aggregation technique might be that, for any grid box containing fewer than, say, 10 houses with PV, we’d ‘merge’ that grid box with neighbouring grid boxes until the ‘merged’ box contains more than 10 houses with PV. That is, all the ‘merged’ grid boxes would be tied together, and would report the same value.

A work-around for the ‘reduced precision’ solution might be to further reduce the precision of the lat/lng in remote areas, so that the lat/lng refers to an area with multiple PV systems.

Any thoughts / concerns? Any challenges we’ve missed?

*edit*: Changed from “fewer than 10 houses” to “fewer than 10 houses with PV”, after Jamie’s suggestion on twitter :0