Weather market data supplier
Data journey
Sourcing the data
The data supply firm sources weather data from 118 countries around the world, including data from Weston Park weather station in Sheffield. The data production and processing practices at Weston Park and the Met Office are typical precursors to the data arriving at the firm.
Traditionally, much of this weather data was gathered from national meteorological services such as the Met Office, but increasingly data are being sourced from alternative suppliers. These alternative suppliers include commercial and private weather networks, as well as those setup for research purposes and amateur observers.
“We regard the national met offices as actually, probably now in terms of the numbers of datasets we get they’re in a minority. You know, the amount–, weather data is increasingly ubiquitous, and the importance of those data sources I think will fall. They will always be higher quality than other data sources, because they’re posher instruments.
“But you know there are so many other networks, so you get regional networks, you get academic networks, you get the local Ag[ricultural] networks. Then the really big contribution is these private observers who increasingly are joining up to public networks and putting their data out there for the benefit of all.”
Despite this proliferation of new data sources, there is still a preference for the high quality data of national meteorological services. In some cases, such as the USA, this data is available for the firm to use free of charge as open data. However, in other cases, such as UK’s Met Office, data is charged for on a wholesale basis.
Sometimes there is a market need for data in a location without enough high quality weather stations to base a contract on. Similarly, the firm may suspect that weather data that is available is of low quality or open to tampering.
In such cases where quality data is needed to support a large high value contract, the firm has been involved in the installation of new observation networks set up for specific contracts. Public sector and commercial partners have also been involved in setting up and running some of these networks with the firm.
For the weather data supply firm, one significant benefit of this approach is that all of the new stations are setup and managed in a standard way, with the same equipment, location criteria, observation practices and metadata.
Data aggregation
This large volume of historical and real-time observation data from numerous sources is aggregated by the firm.
Current weather data is ingested daily over computer networks, and is transferred into the firm’s own weather data software platform. This platform is used by the firm as a repository and data management tool, as well as for its forecasting services.
“In our core database we carry data of something like well over 100 thousand meteorological stations around the world. And that data is marshalled depending on what weather element it is. So it could be temperature, it could be humidity. It could be ground temperature and so forth.
“In our database we support something over, I don’t know, 100 weather elements. But then each of those weather elements, weather variables is further marshalled into up to something of the order of 50 different data quality types. And that might be the matter of data quality respecting reporting conventions, or respecting where that data has come from.”
Data cleaning
Once the data has been aggregated into the database it is cleaned. Data cleaning is an essential process in making the data ready for use in the weather derivative markets. Quality control processes are relatively similar to those of the Met Office and other data suppliers.
Weather data is prone to occasional inaccuracies due to things such as equipment malfunction. There can also be inconsistencies between sites and historical periods. Differences in the way that observations are taken, for example different instruments or observation processes, and changes in weather station surroundings or movement of the observation equipment all contribute to inconsistency.
Other factors contributing to inconsistency include the types of observations taken, the metadata provided and differences in the speed that data becomes available to use from different stations.
Since data is acquired from multiple sources by the weather data supply firm, these inconsistencies need to be addressed and errors need to be checked for. The firm applies its own data cleaning processes to the data it acquires to ensure consistency, reliability and transparency.
Data cleaning begins daily at 7.30am on the day’s updated datasets. It is a full-time task for a small team of staff. When new data arrives at the firm it is subjected to a number of automated checks based on its feasibility and consistency with other sources. Missing data and potential errors are flagged up, and a team of trained meteorologists conduct additional manual checks and make any necessary modifications.
The manager of the data cleaning team has a long history of working with UK meteorological data, and has built up a high level of knowledge and expertise which informs the processes and decisions made in cleaning the data.
“We process something like 100,000 files of weather data a day, and that’s two years old that stat, from multiple providers. We interpret that data, we load it into our database. And for roughly 5,000 of those sites they then go through, essentially a semi real time quality control process. So that data will be handled by the team in the corner here.
“They then use the tools that the development team have produced that alert them to data that looks out of line – that embellish the data with everything from radar images, satellite images, they’re looking at what’s going around, you know, neighbouring stations. And every data point we produce a synthetic. So if Heathrow reports 25 degrees T-max, and Northolt and a couple of other stations around are saying 20, we’re going to say that looks wrong, and we’ll investigate and look for consistency.”
Importantly, the original data as supplied to the firm by meteorological services and other providers is kept alongside the cleaned data as an audit trail. The manually amended data is also annotated with cleaning codes and notes explaining the decision taken.
“We’re going to put this through the microscope, and we’re going to tell you everything about this data. We’re going to give you the raw data, we’re going to give you the quality controlled data. Which means anything that’s dodgy we’ll change, and where there’s missing data, which we know will screw up your ability to price with it, we will fill in.
“You will see the audit trail of the raw and the cleaned. But we will also expose other detail that’s material. So many of these sites have moved. You mention a site that hasn’t, but most sites have wiggled about, you know, closed, moved, instruments have changed. So we can document that, and if necessary provide them with another dataset that allows them to understand that transition.”
Data supply and use
The firm supplies a range of datasets to the weather risk market via its weather data software platform. The datasets it supplies include both historic and current weather data, and may be supplied in ‘raw’ or cleaned form.
It also provides clients with station level forecasts, using outputs from the European Centre for Medium Range Weather Forecasting (ECMWF) and the US National Oceanic and Atmospheric Adminstration’s Global Forecast System (GFS). Datasets are available from the firm on a subscription or ad hoc basis.
In some cases the firm might provide a bespoke dataset and supporting analysis for a specific weather risk contract:
“This process will be modified to take into account any definitions that are within a term sheet [terms of the contract]. So we take account of that. It may include more inspection. It may mean we have to insert a client’s prescribed checking methodology, or infill methodology. As per the terms of their contract that they’ve agreed with their counter parties.”