Solar irradiance datasets

How good are existing, gridded solar irradiance datasets? If no good datasets exist, should we create one?!

To be specific: I need solar irradiance at the Earth’s surface, at high temporal and spatial resolution (15 minutely data @ 1km grid spacing would be great!), and at regular grid spacings, for several years. Ideally with uncertainty estimates.

Ground-based measurements of irradiance provide excellent accuracy; but very poor spatial coverage.

Satellite estimates of irradiance provide great spatial coverage but I’ve heard researchers say that satellite estimates provide poor accuracy, especially in developing countries. (These estimates basically look at satellite images of the Earth’s surface, and infer surface irradiance from the brightness of each pixel. But, of course, the brightness of each pixel is affected by albedo & angle of the surface, and the atmosphere between the Earth’s surface and the satellite).

Re-analyses like ERA5 are great for many meteorological parameters; but I’ve heard that they’re not great for irradiance.

If no good estimates of irradiance exist, should we look into developing an open dataset of irradiance, at high temporal and spatial resolution?! The basic idea might be to learn to infer irradiance from satellite imagery of clouds & water vapour, pollution data, and reanalyses. Use as much ground-based data as possible to train & calibrate the model (e.g. PV power output data). Maybe start in the UK. Does this sound like a good idea?!?

So what you describe is essentially what companies like SolarGIS or Solcast already do. They all operate in two different modes: bankable historic irradiance, and forecasting.
Seeing how and directly working on how Solcast is done, I can tell you now that there is no difference in performance based on how developed a nation is, unless there are microclimate based features that are not captured by numerical weather prediction or reanalyses datasets; there is, however, a lack of validating power at these locations, which might be where you have heard this comment.

If you were to build your own dataset, you would have to design a tool that directly competes with these companies. This is doable of course, as most of the inputs are free (e.g. MERRA2 reanalysis, older satellite imagery), and the data required to train it are also free (BSRN, UKMO, other irradiance ground data).
Also, there would be very little purpose in not doing the whole satellite disk, except for data storage purposes. You could select pixel regions of interest, e.g. the entire UK, but why not capture all important areas of the whole satellite view point.

I can certainly help with this, if it is the path you want to go down. The hard part is having all the variables required in a simple accessible format.

1 Like

Thanks loads for the reply!

Do you think there would be value in having a decent open irradiance dataset. (Even if the open irradiance dataset doesn’t out-perform irradiance datasets from folks like Solcast and Solar GIS?)

It feels like doing this out in the open would have two advantages:

  1. The research community can get involved in the algorithms used to infer irradiance.
  2. The research community can consume the open irradiance datasets for down-stream research on solar resource assessment, disaggregating solar PV from substation power data, solar gain in buildings, etc.

From personal experience and from reading a few papers I get the impression that CMSAF irradiance data isn’t very good in the UK. It tends to be quite biased. The main issues are:

  • Assumptions about optical depth of the atmosphere and aerosol content are based on long-term averages from continental Europe, and not representative of UK.
  • UK is near the edge of field of view so parallax is more of an issue.


I think the SARAH and ERA5 datasets are meant to be better than CMSAF in the UK, but will still have issues with parallax and the fact that satellite observations are instantaneous, when ideally most applications need averaged or integrated.

N.B. I think it would be a good idea to involve Thomas Huld and his colleagues at the ECJRC, they’re specialists in European satellite-derived irradiance observations - might be worth contacted him for some insight? He’ll know the full spectrum of products/datasets that are available freely and can advise which is best for UK. Might even be interested to collaborate…

I’d be particularly interested to explore the concept of assimilating PV generation observations into a satellite-derived reanalysis set of historical irradiance observations. The data from PV systems ought to be useful for:

  • Increasing the spatial resolution of the irradiance data and improving the correction for parallax.
  • Translating 15 or 30 minutely instantaneous satellite observations into 30 or 60 minutely integrated irradiation observations.

We might also consider assimilating ground-based observations of irradiance (e.g. UKMO MIDAS) into the reanalysis set, though naturally that will mean undermining it as a validation set.

I think there is huge value in a publicly-available historical irradiance/irradiation dataset that is more accurate than what is available right now (or perhaps “more localised” is a better term).

1 Like

Do you think there would be value in having a decent open irradiance dataset.

Yes, so long as it validates competitively. There are two key features of importance to a forecast, getting the magnitude of irradiance correct, and getting the ramp timings correct. So long as these two things are reasonably well done, then the dataset would be useful if it were easily accessible via API or other.
Having a real-time nowcasting tool will be much harder than a bankable historic radiation dataset. I like Jamie T’s suggestion of including Thomas Huld, he has worked this field for a very long time and will have good advice if he is willing to give it.
The work required to produce a nowcasting tool is quite immense. The methodologies needed to calculate an estimate from satellite images and other are pretty straight forward and well documented; most of the researcher problem is just being able to work with data conveniently.

@JamieTaylor, I did a bit of work about integrating real-time PV generation observations into satellite derived observations: Redirecting
It worked really well and Solcast will be doing something similar in the future.

You really have to approach this challenge in two modes: Historic and forecast. Historic can take advantage of all the gridded datasets etc.
IMO, clear-sky irradiance anywhere on the planet is accurate enough using MERRA2 reanalysis, OMI nitrogen and the REST2 clear-sky model (I have a paper coming out v. shortly that I will link here when available that demonstrates this).
The only additional feature is clear-sky index as derived from satellite imagery. From my experience, there has been a disappointing amount of public research detailing these specifics (as it is basically the bread and butter of the companies involved). However, it is basically informed with some historic daily trends of each pixel, a pre-processing of the image to remove parallax errors, and a conversion from pixel brightness to received irradiance. This can be learned from all the ground data found within the satellite domain. E.g the SAURAN< BSRN, and UKMO and any others can be used to provide time stamped clear-sky index values which can be compared to the corresponding satellite brightness pixel. Then the conversion would be found.
At this point, you essentially have a basic and reasonable estimate of GHI for history.

Nowcasting gets harder. You have to integrate more NWP models, perhaps whole ensembles in order to get the atmospheric constituents that contirbute to the clear-sky curve. Then there is the business of advecting clouds in the image forward in time in order to get the future clear-sky index. This is much more difficult.


This is a super-interesting discussion, thank you @JamieTaylor & @JBright!

(Sorry for the slow reply - I was on holiday in Cornwall last week. Wow - Cornwall really does have a lot of PV!)

After your advice, and after thinking about it whilst on holiday, I’d propose that the first coding work that I do will be the “historic” part that @JBright mentioned. Specifically: Try training a deep neural network to map from satellite data & numerical weather predictions to proportion of PV output. (Don’t worry about forecasting into the future, yet.)

As well as interpreting the brightness of the pixel representing the location of the PV system of interest, I’d hope that the ML model will learn how clouds, aerosols, water vapour etc. affect the passage of sunlight through the atmosphere.

To be specific: Maybe the net’s input would be satellite imagery for a 500 km x 100 km box, centred on the target site, and containing all relevant spectral channels; and perhaps relevant parameters from NWPs. The output would be the proportion of PV output (in the range [0, 1]) for a single PV site, located in the centre of the input image.

If we have time, maybe we can have an auxilliary objective function, which is to predict the vertical profile of the clouds (using cloud-penetrating lidar / radar data). If the ML model can learn about the 3D structure of clouds, that should help it predict PV power.

This model be trained on a huge amount of data. Sheffield Solar’s Microgen DB + hopefully will give us clean-ish data from around 1 million PV systems. If each system records just 1,000 timesteps then we have 1 billion training examples. Which should be plenty :slight_smile:

At inference time, you can further calibrate the predictions using live PV data.

If this model works well, it should be possible to “sweep” this model over any area of interest, to create a historical dataset of ‘proportion of PV power’.

Then, if this works, I’d propose building a super-simple nowcasting model (advect the clouds according to the wind speed and direction from NWPs, combined with inferred wind speed from recent cloud movements). These ‘nowcast cloud images’ would feed into the ‘PV output predictor’ to create a clumsy PV nowcasting system, which probably won’t work very well, but it’ll be a start.

And then, once the basic pipeline works, we can get stuck into @JBright’s proposal of comparing different nowcasting algorithms, and try to make solid progress out-in-the-open, and SAVE THE WORLD.

Ahem. :slight_smile:

Does that sound sensible?

1 Like

Hi Jack, all.

I’m not an expert here, and you may have already considered and discounted this - however, something I meant to suggest when I initially saw your project via twitter: for North America in particular you may find the PVLIB python library useful.

The library provides methods to access a range of different models providing irradiance and PV power forecasts: “Users can retrieve standardized weather forecast data relevant to PV power modelling from NOAA/NCEP/NWS models including the GFS, NAM, RAP, HRRR, and the NDFD” (GFS is actually global, but is limited to 0.25 degree (so ~25km) resolution; with 3 hour time resolution, so probably doesn’t meet your needs).

The library also contains a set of methods to turn the irradiance forecasts into PV power forecasts, including taking into account specific PV system parameters (orientation, make/model of panel and inverter and their efficiencies/system losses, etc.).

You can find the docs here
It’s on github here

FWIW Will Holmgren, one of the developers / maintainers at University of Arizona, was super-friendly and helpful to me when I had some challenges using the library on a hobby modelling project a year or so ago.


Hi Matt! Great to see you on here! Thanks for the suggestion - PVLIB looks very useful.

I just produced this graphic for a paper validating a solar irradiance database. This basically shows all the meteorological satellites that can be used (some are historic or backups e.g. GOES-13/15, Himawari 7 and 9), but this would the required satellites to get live access to to produce a global forecasting tool.
Fengyun-4 is dashed as it is not used in the dataset I am validating but represents an obvious opportunity.

1 Like