A scalable solution for running ensemble simulations for photovoltaic energy
-
Published:March 22, 2023
- Standard View
- Open the PDF for in another window
-
CiteCitation
Weiming Hu, Guido Cervone, Matteo Turilli, Andre Merzky, Shantenu Jha, 2023. "A scalable solution for running ensemble simulations for photovoltaic energy", Recent Advancement in Geoinformatics and Data Science, Xiaogang Ma, Matty Mookerjee, Leslie Hsu, Denise Hills
Download citation file:
- Share
ABSTRACT
This chapter provides an in-depth discussion of a scalable solution for running solar energy production ensemble simulations. Generating a forecast ensemble is computationally expensive. But with the help of Analog Ensemble, forecast ensembles can be generated with a single deterministic run of a weather forecast model. Weather ensembles are then used to simulate 11 10 KW photovoltaic solar power systems to study the simulation uncertainty under a wide range of panel configurations and weather conditions. This workflow has been developed and tested at scale on the National Center for Atmospheric Research supercomputer, Cheyenne, with more than 7000 concurrent cores.
Results show that spring and summer are typically associated with greater simulation uncertainty. Optimizing the panel configuration based on the individual performance of simulations under changing weather conditions can improve the accuracy of simulations by more than 12%. This work also shows how panel configuration can be optimized based on geographic locations.
1. INTRODUCTION
1.1. Transition in Progress and the Great Challenge
The reliance on fossil fuels for power generation is not sustainable for meeting the increasing demand of a growing global economy and population. Building a sustainable society requires alternative sources of power production that can last for generations to come (Lewis and Nocera, 2006). Thanks to new incentives, regulations, improvements in technology, and development of a competent workforce, paired with a general shift in sentiment against pollution and greenhouse emissions, renewable sources account for a significant portion of the overall energy production portfolio.
In 2019, for the first time in history, more energy in the U.S. was produced from renewable sources than from coal (U.S. Energy Information Administration, 2021). Since the early 2000s, the general trend of energy production in the U.S. has shown a steady decline in coal, which is offset by an increase in natural gas and renewable sources. Predictions by the U.S. Energy Information Administration (EIA) for 2050 suggest that 36% of energy will be generated using natural gas (37% in 2019), 38% from renewables (19% in 2019), 12% nuclear (19% in 2019), and only 13% from coal (24% in 2019) (U.S. Energy Information Administration, 2010). Therefore, while natural gas is predicted to remain almost constant over the next 30 years, renewable will double to offset coal and nuclear power. Specifically, it is projected that solar generation will account for almost 80% of the increase in renewable power generation through 2050 (Dubin, 2021).
Relying on solar power, however, can be challenging, despite the massive progress in utility-scale installation and the prevalence of residential photovoltaic (PV) systems in communities. The sheer amount of available solar power does not necessarily guarantee its full utilization and integration into our electric grid. This is due to rapidly changing weather conditions, astronomical factors, and other events that alter the amount of power produced. In general, it can be said that there is greater uncertainty associated with generating power using PV because external weather conditions play a dominant role.
The key challenge is to match power production with demand, both of which are dynamic and variable. Furthermore, this must occur at different time scales that range from seconds to a day ahead.
1.2. Sources of Uncertainty
In the U.S. and most other countries, energy production planning occurs in what is referred to as a day-ahead market (Ferruzzi et al., 2016). Each day, a portfolio of energy sources is created to meet predicted demand. A bidding process sets the price of electricity, where the different producers offer to sell electricity at a specific price. Generally speaking, if a producer offers to sell a certain amount of energy at a price, not meeting the specific amount incurs a penalty, and generating more energy leads to an opportunity loss (Wen and David, 2001; Davatgaran et al., 2018). This process favors energy sources with smaller day-ahead uncertainty.
Traditional fossil fuel power sources, and some types of renewables such as hydroelectric power, have little day-ahead uncertainty because the amount of potential power generation is directly proportional to the fuel available, whether this is stored natural gas or volume of water in a reservoir. On the other hand, PV solar energy has greater uncertainty due to changing weather conditions, astronomical factors, technological constraints, and operational practices, as discussed below.
Weather: Numerical weather prediction (NWP) is used to forecast day-ahead irradiance, which depends on general weather conditions (Alessandrini et al., 2015). While NWP is effective, it is also prone to error, especially in cases of rare or extreme events. For example, clouds (Chow et al., 2015; Kleissl et al., 2013) and precipitation are important factors that change the amount of solar irradiance reaching solar panels, but their day-ahead modeling is difficult and often requires multiple parameterization schemes. Similarly, aerosols interact with the incoming solar irradiance (Wan et al., 2015; Jimenez et al., 2016) by increasing the diffused component of insolation. Any changes in irradiance, which can either decrease or increase, lead to additional uncertainty in power production. Aerosols are present in larger quantities and thus have a stronger impact in areas with heavy air pollution and during transient events such as desert storms or volcanic eruptions. Because of the uncertainty in weather forecasts, it is currently difficult to consistently and accurately forecast day-ahead PV solar energy production.
Astronomical: PV solar energy is not available during the night, and daytime production changes throughout the year depending on sunrise and sunset times and solar elevation. While this variability can be easily forecasted and accounted for, daily optimization is still required to properly estimate future energy production.
Technological: Another source of uncertainty is intrinsically related to the electrochemical processes that PV panels rely upon to generate electricity. The efficiency of power generation varies based on both irradiance and cell temperature (Dirnberger et al., 2015; Huld and Gracia Amilo, 2015), which itself is related to ambient temperature. Wind speed affects air temperature and thus must be taken into account to make accurate predictions (Mills and Wiser, 2011; Al-Dahidi et al., 2020). Once irradiance and temperature are forecasted, the conversion to power generation can be made through a PV panel simulator. However, the nominal panel power capacity is evaluated under standard test conditions (STC), which may differ from real-world performance. For example, the STC specifies a cell temperature of 25 °C and an irradiance of 1000 W/m2 with an air mass of 1.5 spectra. These conditions are only rarely, if ever, met.
Operational: Finally, panels must be maintained and cleaned, as they generally degrade, leading to conversion efficiency decrease. Given two identical weather conditions, a panel may generate different levels of electricity based on the physical condition.
1.3. Solar Power Forecasting
Solar power forecasting can be viewed as a two-step problem: characterizing the weather and the PV panels. There are many different methods for solving these two problems, sequentially or concurrently, and they depend on factors such as the temporal and spatial scale of the results, the temporal resolution, whether a measurable level of uncertainty is required, and the computational resources available.
Solar power forecasting is usually based on the temporal forecasting horizon (Gensler et al., 2016). Forecasting up to six hours is referred to as short-term, and it is where extrapolation and persistence methods excel due to their computational efficiency and high accuracy (Pedro and Coimbra, 2012). Mid-term forecasting follows, spanning from 6 h to 72 h, where physical and/or statistical numerical modeling is considered the most effective technique. Because of the rapidly changing environmental conditions, the immediate ambient history is not a good predictor of future outcomes.
For day-ahead, mid-term forecasts, which are the focus of this research, the weather part is solved using NWP models that resolve weather features and forecast future conditions (e.g., cloud cover, wind speed, and temperature). They are run at different temporal and spatial scales and can be optimized for specific locations and weather patterns. However, weather forecasts do not directly translate into the amount of power generated, and specific characteristics of the PV array and its installation, as well as operational status, must be taken into account.
The second step can be solved using solar panel numerical simulators or statistical (data-driven) methods based on past measurements. Which method to use is dictated by the specific needs and level of accuracy required. In general, simulators do not account for specific aspects of installation and location, while statistical solutions are limited to historical patterns that may underrepresent conditions, most importantly, rare or extreme events. The latter method can only be used if there is a robust history of measurements but is generally unsuitable for feasibility analysis, for example, of a potential site. These statistical relationships are usually location-dependent: while they may be optimal for predicting a specific panel configuration and installation, this optimization is hardly generalized to other locations or installations. Finally, because panels degrade over time, a statistical correction is needed to capture the decrease of conversion efficiency.
Among the statistical and computational methods, Artificial Neural Network (ANN) has been used to predict both the behavior of panels and, in some cases, also the weather conditions. Previous literature (Al-Dahidi et al., 2020; Pedro and Coimbra, 2012; Chen et al., 2011; Omar et al., 2016; Sperati et al., 2016; Nitisanon and Hoonchareon, 2017; Cervone et al., 2017; Chen et al., 2017; Kumar and Kalavathi, 2018; Theocharides et al., 2020) studied the application of ANN to solar power forecasting because it can be trained to focus on a geographically confined region and optimize local solar power forecasting.
The general idea is to train an ANN to learn the statistical relationship between weather variables (input) and the observed power production (output) as an end-to-end model. The parameters of the trained model are optimized based on a preselected cost function or an objective function. This function measures the fitness of the model to the training data. Furthermore, ANN can be integrated into the Analog Ensemble (AnEn) technique (Cervone et al., 2017). ANN predictions can be treated as an additional predictor to be provided to the weather analog search. This combination approach has been found to be effective at predicting solar irradiance at three power plants in Italy.
The advantage of using an ANN is that it is typically more accurate than numerical simulators due to its optimization. Evaluating a trained model is also much cheaper in computation than running the numerical simulation. However, ANN is data-hungry, meaning that it requires a large amount of historical data of weather conditions and the associated power production records. The latter is usually difficult to access because the data are proprietary. A second disadvantage is that a trained model is always conditioned on the training data. So, if there is a significant update to the weather model or ground observing station, the learnt relationship may be voided.
1.4. Quantification of Uncertainty
The simulation of a power production system is a complex process. Uncertainty arises when comprehensive information about the modeled process and the initial condition is lacking.
To simulate the amount of power generated, information regarding the weather (solar irradiance and temperature) and the panel (installation and condition) are needed. However, both are subject to a lack of information, and therefore, uncertainty. The atmosphere is a chaotic system that cannot be precisely observed, and as a result, “the prediction of the sufficiently distant future is impossible by any means” (Lorenz, 1963). Observations often have limited spatial and temporal resolutions, and NWP models only approximate physical processes and atmospheric interactions.
With regards to the simulation of panel performance, uncertainty can originate from an imprecise representation of the surrounding environment, e.g., shadowing, and from an incomplete characterization of real-world performance under a wide range of possible weather conditions, since the panels are typically tested under the STC.
Simulation ensembles are particularly helpful in quantifying uncertainty in forecasts. In weather forecasting, NWP models can be run many times to generate a range of possible deterministic realizations of the future state of the atmosphere. This ensemble of possible future states provides a quantifiable representation of the uncertainty for a given forecast. If ensemble members are very different, referred to as ensemble spread, the specific uncertainty of the forecast is high.
There are many approaches to constructing a forecast ensemble, for example, applying varying perturbations to initial state variables (i.e., wind speed and temperature), changing schemes of dynamics or parameterizations of an NWP model, or using stochastic means to perturb physical parameterizations (Berner et al., 2009; Haupt et al., 2017). A hybrid approach can also be devised by combining one or more of the aforementioned methods. However, these approaches involve running NWP models multiple times, which is computationally expensive.
The analog-based approach, AnEn, has been proposed (Delle Monache et al., 2013) to construct forecast ensembles without extra runs of NWP models. It does not require extra runs of the deterministic model, which already offers a great reduction in the computational cost of ensemble generation. It has been successfully applied to a series of forecasting problems, including surface variables (Delle Monache et al., 2013; Junk et al., 2015a; Hu and Cervone, 2019; Alessandrini et al., 2019) and renewable energy production (Vanvyve et al., 2015; Alessandrini et al., 2015; Junk et al., 2015b; Cervone et al., 2017; Alessandrini and McCandless, 2020). An in-depth introduction to the AnEn is provided in Section 3.1.
1.5. Scope and Organization
This chapter introduces a scalable workflow that focuses on evaluating solar power production and its predictability throughout diverse regions. Instead of relying on ground observations, an NWP model is coupled with a PV solar power simulation system that tests the predictability of 11 different PV modules. These PV modules vary in many aspects including material, size, and efficiency.
Power forecasts are generated by using an AnEn ensemble of NWP forecasts that characterize future weather conditions, where each member of the ensemble is used as input for the panel simulator. Therefore, for any given day, 21 weather conditions are each used to generate 11 power predictions, where 21 is the size of the weather ensemble and 11 is the number of the panel configurations. This is repeated for all grids over the continental United States (CONUS) (~60,000 grids) every hour for daylight time of the day for 2019, leading to over 60 trillion computations. The system is verified by using the analysis fields of the weather model, which is the closest complete and high-resolution geographic estimation of real-world weather conditions.
Results show that PV modules have various levels of predictability and should be chosen to optimize both the power production and the predictability to support solar energy penetration. This workflow is designed to be scalable because of the very large number of computations required, which can scale significantly if testing more panel configurations or further increasing the spatial and temporal scope or their resolutions.
The rest of the chapter is organized as follows. Section 2 introduces the research data used in this project, and Section 3 describes the main methodology of generating ensemble forecasts with the Parallel Analog Ensemble (PAnEn) (Hu et al., 2020) and its integration with a power simulation system implemented with PV_LIB (Holmgren and Groenendyk, 2016; Holmgren et al., 2018). Section 4 provides the results of solar power simulation from a total of 11 PV modules, Section 5 discusses the design of the scalable workflow with the RADICAL Ensemble Toolkit (EnTK) (Balasubramanian et al., 2016), and Section 6 provides a summary.
2. RESEARCH DATA
2.1. The North American Mesoscale Forecast System
The North American Mesoscale Model (NAM) is a weather forecast model operated by the National Centers for Environmental Prediction (NCEP) that aims to provide short-term, deterministic forecasts of weather variables at a variety of vertical levels and on multiple nested domains. Currently, the Weather Research and Forecasting (WRF) Nonhydrostatic Mesoscale Model (NMM) is run as the core of NAM. To make the subsequent discussion easier, we use three names interchangeably hereafter: NAM, WRF, and NMM.
In this study, the parent domain (NAM-NMM) is used, with a 12 km spatial resolution. The production run provides forecasts with an 84 h forecast lead time (FLT). The first 36 time points are hourly, and the rest are every 3 h. The parent domain run is initialized with a 6 h deep analog (DA) cycle, and it is updated hourly using the hybrid variational ensemble gridpoint statistical interpolation (GSI) and the NCEP global ensemble Kalman filter method. The analysis production is usually referred to as NAM-ANL.
In this study, NAM-NMM initialized at 00Z, and NAM-ANL initialized at 00 UTC, 06 UTC, 12 UTC, and 18 UTC. Data were collected between 2017 and 2019. Since the day-ahead PV energy market is of particular interest and the CONUS covers four time zones, we focused on the FLTs from 06Z to 32Z from NAM-NMM. The lead time period covers from 01:00 (the day of initialization) to 03:00 (the next day) in the Eastern time zone and from 22:00 (the day before the initialization date) to 00:00 (the next day) in the Pacific time zone. To simplify the computation and analysis, there was no compensation for daylight saving when calculating the local time. NAM provides a few hundred variables. Table 1 summarizes the variables used in this study.
Figure 1A shows the NAM orography, and Figures 1B–1D show the annual average of daily maximum temperature, daily maximum global horizontal irradiance (GHI), and cloud cover, respectively, for 2018. Model orography well reflects the elevation variation at and to the west of the Rocky Mountains and at the Appalachian Mountains. Spatial features of the Coastal Ranges are also legible on the map due to the high resolution.
Figures 1B and 1C show the annual average of daily maximum temperature and GHI, respectively. Surface temperature exhibits typical longitudinal patterns with exceptions in regions with drastic topographic changes. GHI correlates strongly with temperature, exhibiting higher levels at the Rocky Mountains, the Great Basin, and Coastal Plains near Florida in comparison to the other regions. Adversely, the Coastal Ranges in west Washington and west Oregon show a low level of annual GHI. This can be explained by the annual average cloud cover, as shown in Figure 1D. Significant cloud cover is observed near these regions, which blocks the incoming solar radiation.
2.2. International Energy Conservation Code Climate Zones
It is important to evaluate and compare power production and predictability under different climatological conditions. This study adopts the climate zones defined in the International Energy Conservation Code (IECC) as the criterion for determining clusters within the CONUS, as shown in Figure 2. IECC is a set of building codes defined by the International Code Council, which is funded by the U.S. Department of Energy. Its main purpose is to establish the minimum design and construction requirements for energy efficiency. It assigns codes at a county level, and the code typically consists of two parts: a numeric digit and a letter. The number is determined based on the thermal criterion, which is the accumulated daily temperature anomaly from a predefined temperature threshold. Codes 1 to 8 typically correspond to places that are very hot (in need of cooling) to very cold (in need of heating). The other part of the code consists of either A, B, or C, which correspond to the major climate types of moist, dry, and marine1. This classification considers both temperature and precipitation. In the case of the CONUS, thermal conditions change longitudinally while climate types generally change latitudinally.
3. METHODOLOGY
PV energy production is closely coupled with meteorological conditions (in particular GHI and temperature) and solar panel configuration (panel type, location, health, and installation). A complete predictability assessment of the PV energy production essentially needs to examine both the meteorological factors of weather forecasting and the engineering factors of solar panel performance. This section introduces the framework designed for assessing the large-scale predictability of PV energy production with the multivariate AnEn and a simulated performance system.
3.1. Analog Ensemble and the Extension to Multivariate Forecasts
Ensemble simulation is an effective approach for assessing the predictability of a weather process. However, generating a forecasting ensemble is computationally expensive. It usually involves multiple model runs, each of which requires significant computational resources. The AnEn generates forecast ensembles by relying on a single run of a deterministic model and historical observations. It is an efficient solution for ensemble generation in comparison to other methods that require multi-initialization or multiple models. It is parallelizable and scalable because of the independent ensemble generation at an individual time and place. An in-depth description of how AnEn works is provided below.
The AnEn generates forecast ensembles based on the assumption that similar weather forecasts exhibit similar biases, and by using the observations, these biases can be corrected within an ensemble. The most important component of the AnEn is a forecast similarity metric (Delle Monache et al., 2013), which is defined using the following equation,
where represents the distance, or the dissimilarity, between a multivariate, deterministic target forecast, Ft, at time, t, and an analog forecast, At′, at a historical time, t′. The right side of the equation is a weighted normalized average of the differences from each predictor variable. N is the number of predictor variables; ωi is the weight associated with the predictor i; and σi is the standard deviation, as a normalizing factor, for the predictor i. Fi,t+j is the target forecast at time t + j for the predictor i, and similarly, Ai,t′+j is the historical analog forecast at time t′ + j for the predictor i. Finally, is the half size of the temporal window for trend comparison. This parameter is devised to consider the forecast similarity within a small time window rather than just at a particular point in time to improve the quality of the weather analogs identified. In practice, the half size of the time window is usually set to 1 h, meaning values from the previous, the current, and the next hours will be compared during the calculation of dissimilarity at the current point in time.
One of the important inputs for AnEn is the predictor weight (ωi). Usually, it is optimized using a brute force search algorithm that experiments with all combinations of weights for each predictor, for example, from 0 to 1 with an increment of 0.1. A useful trick is to force all predictor weights to add up to 1 to require less computation. However, it is, by nature, a computationally expensive task and not suitable for large simulations. In this study, we propose alternative optimization methods based on clustering.
For AnEn to work properly, a set of forecasts and the corresponding observations are required. The time period of the data is usually two years or longer, with at least one year as the historical period, namely the search repository, and the rest of the time as the forecast period, which is namely the test period. At least one year of data in the search repository is needed to account for seasonal and annual variations and to ensure that good weather analogs can be found. The length of the historical archive required is much shorter than that of other analog-based forecasting techniques (Van den Dool, 1994; Hamill and Whitaker, 2006) because, with the AnEn, weather analogs are identified within a highly restricted spatial location, independently at each grid point, and in a short period, usually a few hours. As a result, the degree of freedom in how forecasts can differ from one another is largely constricted both in space and time, making it easier to identify weather analogs. But the generality still holds: better weather analogs can be found given a longer search repository.
The operational AnEn is a variant of this method. In this setting, the size of the search repository is not fixed. Rather, it is growing by incorporating forecasts from the test data set as soon as the target forecast becomes historical. In other words, each target forecast in the test data set has a different length of search history that incorporates all immediate historical forecasts. This alternative setup helps in cases where only a limited length of search history is available.
After forecast analogs have been identified for a particular target forecast, the AnEn constructs the final ensemble by directly using the observations associated with the forecast analogs since the forecast analogs are identified from the historical archive. The historical observations comprise the AnEn forecast ensemble. This process is repeated for all geographic locations, all target forecasts in the test data set, and all forecast lead times to generate the complete set of forecast ensembles. This process is highly parallelizable and scalable because the ensemble generation of forecasts is independent in space and time. This is also referred to as an “embarrassingly” parallelizable task (Cervone et al., 2017; Hu et al., 2020).
However, while the spatial and temporal independence brings great advantages in computation, it leads to inconsistent and sometimes unrealistic predictions when ensemble members are analyzed individually (Sperati et al., 2017; Alessandrini and McCandless, 2020). This is because the first member in the forecast ensemble at location may not necessarily be related to the first member in the forecast ensemble at a nearby location, say i + 1. The generation of these two ensembles is independent. The member rank follows a statistical process to determine the distance between the target and the analog forecasts rather than any physical linkages or processes. As a result, plotting a geographic map using the first member from ensembles on a particular day leads to patchy and unrealistic features. This problem can be effectively addressed by using a form of summarization of ensemble members, e.g., the mean or the median. If the ability to analyze each ensemble member is truly desired, the AnEn members need to be shuffled with the Schaake shuffle (SS) technique (Clark et al., 2004; Voisin et al., 2010; Sperati et al., 2017; Clemente-Harding, 2019). Ensemble spread and standard deviation can be calculated if the uncertainty of ensembles is to be evaluated. A kernel density estimation function can also be applied to the AnEn members to construct the probability distribution function (PDF) for probabilistic forecasts. In practice, AnEn forecast ensembles are usually analyzed in their deterministic or probabilistic forms, rather than by the individual member in space and time.
The AnEn technique for predicting a univariate target has been well studied under the presumption that observations of the variable are available. This scheme does not apply when using the AnEn to generate forecasts for initializing the PV power simulation, because the power simulation depends on multiple variables, including GHI, temperature, wind speed, and albedo. Simplistically, one would argue that multivariate forecasts can be generated by running the AnEn multiple times with each generation focusing on one variable at a time. However, this argument does not stand, because each AnEn generation is independent and thus, the multivariate forecasts will again suffer from having inconsistent and unrealistic predictions in space and time. In this case, the SS does not help, because it is developed for a single run of AnEn.
In this study, we propose the multivariate AnEn and extend the application of AnEn to multivariate forecasts with a single pass of the AnEn technique. The multivariate AnEn still relies on the same similarity metric, as defined in Equation 1, to identify analog forecasts for target forecasts. After analog forecasts have been selected, the multivariate ensemble generation shares the same set of analog forecasts. This constrains the degree of freedom when generating multivariate AnEn forecasts, while the multivariate AnEn forecasts are only possible if they have been observed as a whole in the past.
Figure 3 shows the schematic diagram for generating a four-member, multivariate AnEn. Forecasting and observational data sets are represented as a directed arrow to indicate that their sizes may be variable, for example, as in the operational AnEn. The horizontal direction represents time, and it is divided into the search period (as shown in gray) and the test period (as shown in white). The first arrow at the top represents the NAM-NMM. By the AnEn convention, it is a multivariate, deterministic forecast (recall the parameter N in Equation 1). The other three arrows below represent three observational data sets that are aligned within and with the time series of NAM-NMM. The algorithm is described below:
(1) A multivariate forecast from the test period is selected for which analog forecasts are desired.
(2) Four historical forecasts from the search period with the highest similarity (the shortest distance) are identified.
(3) The associated observational values are retrieved from a particular observational data set to form the forecast ensemble. This process is repeated for all of the available observational data sets until a complete set of multivariate AnEn has been generated.
The parameterization of the multivariate AnEn stays the same with the univariate AnEn, for example, having a 1 h window half size and favoring a longer search period, with one exception for predictor weights. Usually, predictor weights can be optimized based on an error metric of the variable of prediction, namely the predictand, for example, the root mean square error (RMSE) or the continuous rank probability score (CRPS). But the optimization objective is slightly different in the case of the multivariate AnEn, because each predictand may be associated with a different set of optimized weights if it is only conditioned on the individual predictand. In other words, weights for predicting GHI could be very different from weights for predicting wind speed. We, therefore, evaluate the performance of predictor weights based on the forecast error of the final product, the simulated PV power production, by running a power simulation system with the AnEn forecasts.
3.2. Scalable Ensemble Simulation for Photovoltaic Energy Production
The most critical parameter for PV power simulation is GHI, which captures cloud coverage and precipitation, while the power production also depends on the configuration of solar panels and other weather variables including ambient temperature and wind speed. Section 3.1 discusses the solution for generating multivariate weather ensembles and therefore, this section describes the latter half of the power simulation workflow and running the workflow at scale.
There are a variety of software packages available for PV system performance simulation, including PVsyst from FirstSolar, SAM (Blair et al., 2014), PVWatts from renooble, and PV_LIB (Holmgren and Groenendyk, 2016; Holmgren et al., 2018). The open source Python package PV_LIB was chosen for our study for several reasons. First, PV_LIB is an open source package that allows in-depth scrutiny of intermediary model results and configuration, and the source code is also easily available for study purposes. It modulizes panel configurations so that comparative analysis be conveniently conducted using varying models and assumptions. Finally, its well-designed application programming interface (API) enables straightforward integration with external workflows; in our case, the ensemble simulation of PV power production was integrated with AnEn.
In general, there are five steps for simulating the PV power production at a given location and time.
(1) Extraterrestrial irradiance calculation: First, the astronomical position of the Sun relative to a particular location on the surface of the Earth needs to be calculated. This relative position can be used to calculate the extraterrestrial irradiance as well as the solar zenith angle. The process is largely deterministic due to the well-established moving trajectories of planets.
(2) GHI decomposition: The GHI needs to be decomposed into the Direct Normal Irradiance (DNI) and the Diffuse Horizontal Irradiance (DHI). The DNI is the amount of solar radiation per unit area reaching a surface that is positioned perpendicular to the sunlight coming straightly from the sun. In production, this parameter can be further optimized by adopting the solar tracking technology where the surface of the panel is adjusted in real time to directly face the sun. The DHI is the amount of radiation per unit area reaching a surface that does not follow a direct path but rather has been scattered by molecules or particles in the atmosphere. GHI, DNI, and DHI are interdependent, and any of the variables can be derived from a combination of the other two. The DISC model is used to estimate the DNI from the GHI. The DISC model calculates an empirical relationship between the GHI and the DNI based on the global and direct clearness indices.
(3) Incident irradiance estimation: After the decomposition of GHI from the atmosphere, these estimates need to be converted to their corresponding in-plane irradiance components. They are sometimes referred to as the plane-of-array irradiance or the incident irradiance. This calculation is necessary to determine the contributions of various components to the final power output of a PV system.
(4) Cell temperature estimation: PV cell temperature needs to be estimated from the weather conditions because it affects the efficiency of solar panels. Cell temperature is highly correlated with the ambient temperature but is also affected by materials, for example, glass or polymer and wind speed.
(5) Power production estimation: Finally, the power output can be calculated based on the plane-of-array irradiance, the PV cell condition, and the specific PV system configuration, such as the installation and panel tilt. Since this study investigates the predictability of different solar panels rather than the optimization of a particular panel setup, it is assumed that solar panels are always facing upwards parallel to the ground.
See Appendix A for a code snippet that implements the above workflow with PV_LIB. It provides an example of simulating power output at a given location and time2.
There is one important remark regarding the integration of the aforementioned workflow into the ensemble simulation. If the complete process is carried out for each forecast ensemble separately, the calculation of extraterrestrial irradiance at a given location and time could be repeated, because multiple ensembles are available. The calculation roughly takes up 8% of the total runtime of the minimal workflow with variation depending on the platform; therefore, the computation is non-trivial. In the proposed scalable workflow, the solar position and the extraterrestrial irradiance were pre-computed only once for all locations and times of predictions, and these precomputed results were then reused during the actual power simulation stage.
Figure 4 shows the proposed framework for integrating the multivariate AnEn, as implemented by PAnEn (Hu et al., 2020), and the performance simulation system, as implemented with PV_LIB. It consists of a controlled power simulation run initialized with NAM-ANL and a batch of experimental power simulation runs initialized with AnEn and NAM-NMM. This workflow is highly scalable, because the generation of AnEn can be partitioned into geographic sub-domains and run concurrently. The computation for each sub-domain can be done independently, which leads to low overhead communications cost. This parallelization is made possible thanks to the de facto standard of message passing interface (MPI) for distributed computing and the workload manager, RADICAL EnTK, developed by the RADICAL Research Group at Rutgers University. A detailed discussion of EnTK is provided in Section 5.
Finally, the workflow is repeated for 11 different solar panel types. There are, in total, 523 built-in solar modules in PV_LIB (as of pvlib v0.8.1) provided by the Sandia module database and 21,535 from the Clean Energy Council module database. Even with access to supercomputers and a large allocation, it is prohibitive to simulate all of the modules. It is also unnecessary, because while all panels are different, they can be clustered into groups of similarly behaving panels.
Clusters were generated using the size of the solar panel, the total number of PV cells in a series, and the STC power rating of the panels in the Sandia module database, and we carried out a clustering analysis to identify 11 module clusters. We then select the module with the maximum STC rating from each cluster. A summary of the selected solar modules is provided in Table 2.
4. RESULTS AND DISCUSSIONS
Hourly day-ahead PV power production simulation for 2019 with a spatial resolution of 12 km was completed throughout the CONUS. Because the power production at night is zero, only daylight hours were used. This leads to a different number of daily forecasts, because for each location the length of the days varies throughout the year.
In this study, the weather ensemble generated has 21 members and covers a two-year search period starting on 1 January 2017 using the operational mode. Please recall that in the operational mode, the search period is accumulated as the test time moves forward. Each of the 21 members is used to simulate power production. In the next section, results are shown for the weight optimization of the multivariate AnEn generation and the power simulation ensembles.
4.1. Weight Optimization for Multivariate Analog Ensemble Generation
In this study, weight optimization follows this practice. However, having one set of weights (hereafter called equal weighting [EW]) for the entire region of the CONUS is not optimal, because the large domain coverage calls for different optimal weights. On the other hand, optimizing for each of the 56,776 grid points is too taxing computationally. As a compromise, two alternative approaches are discussed to set weights for each of the grids.
(1) The first approach optimizes weights from a sample of equally spaced grids within the CONUS, and each grid is assigned the weights of the nearest sample point (hereafter called nearest neighbor [NN]). The sample points have a latitudinal spacing of 4.5° and longitudinal spacing of 3.5°.
(2) The second approach uses a hierarchical clustering algorithm based on four features, including orography, annual GHI, temperature, and the average cloud cover. The rationale is that these four parameters inform the geographic and meteorological regimes (hereafter called regime based [RB]).
For both methods, the CRPS is the metric or objective function being minimized, and it is computed at the center points of the NN sample grids, or for RB at a random sample of points from each cluster using data from 2018. The CRPS is calculated based on the actual power generation from the PV system rather than on the GHI, which is the real quantity of importance for this research. Rather than trying all 11 panels, the optimization is performed only on a single PV module, specifically the Silevo Triex U300. This was chosen due to its relatively large power capacity of 300 W and its popularity in real-world installations.
Figure 5 shows how the clusters are defined. The hierarchical clustering algorithm is an alternative to the well-known K-mean algorithm, and it differs in not requiring the number of clusters as input. Rather, the number of clusters is based on point distances.
There are two sub-categories of the hierarchical clustering: divisive and agglomerative (Pedregosa et al., 2011; Nielsen, 2016). In agglomerative clustering, the algorithm starts by assigning each grid point to a cluster and then follows an iterative process in which the two closest clusters are merged. Different metrics can be used to define the distance between clusters, for example, using the shortest pair of points from two clusters or using the average distance between two clusters. The first definition is used when there are clear-cut shapes in the data, whereas the second definition is used when geographic locations are naturally correlated and variables tend to change smoothly (Ferstl et al., 2016). Because of the high spatial correlation among grid points, this research uses the average distance between two clusters.
An effective way to visualize the process of hierarchical clustering is through a dendrogram (Fig. 5A). Clusters that are closer to one another are drawn closer along the horizontal axis. The clustering is an iterative process, meaning that, by the end of the process, there will only be one big cluster consisting of all the grid points. The height of a horizontal line connecting two clusters indicates the distance between merger clusters. This figure helps to decide the number of final clusters by providing information on how close the clusters are and how many points each cluster would have. We, therefore, generated 15 clusters. Figures 5A–5C show the relationship among variables used for clustering. The general correlation between variables is well-expected, for example, a negative correlation between temperature and orography or cloud cover and GHI and a positive correlation between GHI and temperature. The clusters separate the regimes effectively; #15 corresponds to a regime of high elevation, low temperature, and a high level of GHI.
Figure 6 arranges the NN and the RB clustering within the geographic context. Figure 6A shows the equally spaced grid for each cluster. The red point shows the centroid location of the overlapped region for which weights will be optimized. Figure 6B shows the RB clusters, and the black points show the sample locations where the weights are optimized. The total number of sample points from the two approaches remains similar, with 71 for the NN and 70 for the RB. Figure 6B shows consistency with the IECC climate zones, which are shown in Figure 2. However, the determination of regimes is more dependent on solar-related features, while the climate zones are mostly defined by temperature and precipitation.
A verification along space and time was used to evaluate the effectiveness of the optimized weights. Figure 7 shows the RMSE for four prediction methods, the baseline model NAM and three different AnEn results, each of which is characterized by a different weighting strategy. The RMSE is calculated based on power simulation, meaning that both NAM and AnEn forecasts were used as input to the panel simulator. Model analysis has also been fed into the simulator to generate the most realistic version as the “ground-truth.”
NAM forecasts are computed in UTC times, which is not convenient when comparing large regions. Because the CONUS effectively spans four time zones, significant artifacts are introduced when comparing far away regions like the East and West Coasts. For example, generating forecasts for 17h00 UTC effectively means comparing forecasts for noon local time in New York with 09h00 local time in Los Angeles.
To avoid these inconsistencies, all computations are made in local time. However, rather than using the time zone maps to carry out the time conversion, which adds an artificial transformation between solar time and a geographical region, the real solar noon is computed at each location, and the forecast lead times are realigned so that the forecast noon is the closest solar noon. Solar noon is the time when the Sun passes a location’s precise meridian and reaches its highest elevation in the sky. This time corresponds to the highest level of solar extraterrestrial irradiance, and on a clear sky day it also corresponds to the highest level of surface irradiance. Because NAM provides hourly forecasts, the time difference between the solar noon and the forecast noon is at most ± 30 min.
Figure 7A shows the RMSE averaged across the CONUS as a function of lead time for the AnEn ensemble means of EW, NN, and RB, and for the deterministic forecasts of NAM-NMM. On average, the RMSE of NAM-NMM peaks at solar noon with a value of 100 W/m2. EW AnEn reduces the prediction error by 11.83%, from 100 W/m2 to 88.42 W/m2. However, using only equal weights to search weather analogs is sub-optimal. AnEn NN and RB show results from the two optimizing schemes tested.
NN reduces 14.41% of the prediction error of NAM-NMM from 100 W/m2 to 85.85 W/m2, and RB reduces it by 16.26% from 100 W/m2 to 83.99 W/m2. While the improvement is largest at solar noon, it follows the same trend for the other lead times, thus with RB, NN, EW, and NAM consistently ordered from lowest to highest error (Fig. 7A).
Figure 7B shows the prediction bias of the four methods. Bias is a verification metric that captures whether a weather model systematically over- or under-predicts. Ideally, predictions should have a bias of zero, meaning that the forecasted ensemble mean always overlaps with the observed average. A positive bias (high bias) indicates that the forecast is too high. A negative bias (low bias) thus indicates the opposite.
In terms of solar irradiance forecasts, NAM-NMM generally exhibits a negative bias during most of the daytime but shifts to a positive value during the afternoon. The peak value is a negative bias that occurs one hour before solar noon. Because the RMSE is increasing during this time while the bias improves slightly, an increase in random error is offsetting the bias improvement during this period.
All three modes of AnEn have better biases. At solar 11:00, when the bias of NAM-NMM is worst, the four methods have the bias of −15.87 W/m2 (NAM), −7.34 W/m2 (EW), −5.65 W/m2 (NN), and −4.83 W/m2 (RB).
Figure 8 shows the RMSE maps of the entire domain for the four methods. Each pixel shows the yearly average of RMSE for all daylight lead times. Prominent in Figure 8A are two high error regions over Florida and Southern Texas. Figures 1C and 1D show that these two regions generally receive a high level of solar irradiance and total cloud cover. High levels of power production can be achieved because of the high solar irradiance, but the prediction error is also high due to high cloud cover.
However, another region in the Pacific Northwest that also has high cloud cover instead shows visibly lower prediction errors than those for Texas and Florida. This is most likely due to the different amounts and variability of cloud cover. Clouds over the Pacific Northwest are regular and persistent, which is in part reflected by lower solar irradiance values (Fig. 1C). The absolute error is, therefore, smaller because the magnitude of the power output is never as high as in Florida and Texas. Another feature to note in Figure 8A is the high accuracy over Arizona and Southern California. These regions have high solar irradiance and low cloud cover, making them easier to predict.
Figures 8B–8D show, respectively, the RMSE for EW, NN, and RB. All three methods show lower errors compared to NAM-NMM overall, as well as in specific areas. This can be attributed to the independent search of weather analogs with a fine spatial scale at each grid point, and it allows the weather analogs to adapt to local weather regimes and correct the forecasts accordingly.
Figure 8 reveals one systematic problem with the NN approach. Recall from Figure 6A that, due to the computational limitations, it is not feasible to run weight optimizations for each grid point. Only the center of a grid point is used to compute the optimal weights, which are then reused for the entire grid of roughly 4.5°×3.5°. If a non-representative sample point for the entire grid is selected, incorrect weights are used for several points. Its negative impact on prediction error is most noticeable for the 14th, 19th, 27th, and 58th regions in Figure 6. These regions have larger RMSE, which is surrounded by areas with noticeably lower errors. On the other hand, RB, which uses four variables (topography, solar irradiance, temperature, and cloud cover) to define various regimes, has a higher homogeneity in terms of power production, which results in a smoother RMSE surface (Fig. 8D).
Figure 9 shows scatter plots with pairwise comparisons for all modes of AnEn, along with a diagonal reference line. Points below the diagonal indicate that the method on the vertical axis is better. Figure 9A compares EW, on the horizontal axis, with NN, on the vertical axis. The majority of the points lie below the diagonal line, which suggests that NN outperforms EW. However, points from the 14th and the 58th (Fig. 7A) grids are highlighted using different shapes, because they all show an inverse trend, which indicates that the weight optimization failed for these regions.
These issues are likely to be solved by increasing the number of clusters and reducing their geographic area. However, this will significantly increase computation time, which is what the method is trying to avoid. Without increasing the sample points, Figure 9B shows the comparison between the EW on the horizontal axis and the RB on the vertical axis. Please recall that the numbers of sample points are comparable with 71 points for the NN and 70 points for the RB. The majority of points lies below the diagonal, which indicates the clear outperformance of RB compared to EW.
Figure 9C compares the NN and RB. The difference is smaller than in previous comparisons because both methods (NN and RB) involve grid search but with differences in how regions are defined and samples are drawn. Yet, RB shows its outperformance over the NN for the regimes 10, 11, and 12, which overlap with the NN clusters where non-representative sample points are chosen.
RB also demonstrates patterns of errors related to the spatial characteristics of the regimes. For example, regime 1 covers the Southeastern CONUS associated with high solar irradiance and middle-to-high cloud cover. Points are clustered, in Figure 9B, at the right tail of the spectrum and have a high error of power simulation. On the other side, regimes 13 and 12 cluster at the left tail of the spectrum and have a low error of power simulation. These regimes are typically associated with middle-to-high solar irradiance but very low cloud cover. The clustering of error within RB regimes indicates the effectiveness of the regimes to separate different error patterns associated with NAM-NMM, and as a result, the predictor weights can be optimized by focusing on correcting a particular type of error.
Finally, Tables 3–4 compare the forecast error based on two different geographic clustering criteria. Table 3 averages errors from the regimes defined by the RB. Rows in the table are sorted based on the RMSE of the RB from the lowest to the highest. Verification results are shown for the solar noon only because that is when the solar irradiance and the power production reach the climax.
The RB mostly achieves lower errors compared to the NN except for the regimes 7, 9, and 14. RB only has one sample point for each of these regimes, which makes it subject to the same issue NN has. However, these regimes tend to be small clusters and therefore have a weaker impact on the overall predictability, unlike in the case of NN, where each sample point will have an equal impact due to the same size of each cluster. On average, at solar noon, the RB has the lowest prediction error. The average statistics are consistent with Figure 7A at solar noontime.
It is possible that RB outperforms the other methods simply because the verification is aggregated in favor of how RB is optimized. To rule out the potential bias, an independent clustering, from the IECC climate zones, is used to average errors, as shown in Table 4. The total number of grids within the IECC climate zones is smaller (462 points, 0.8%) than the total number of grid points within the CONUS as defined by NAM-NMM because of a slight difference in the coordinates mostly along the coasts (not shown).
Similar results can be observed that show RB to be predominately the best method compared to NAM-NMM, the EW, and the NN. In general, error reduction during noon is larger than the reduction during the morning and afternoon. The climate zone 3C is shown to have the lowest prediction error throughout the day. Again, it is associated with high solar irradiance and low cloud cover.
We therefore conclude that RB is a more reliable and effective weight optimization method for problems where a tradeoff must be made between computing weights for a large geographic region and the limited computational power available.
4.2. Predictability Assessment of Solar Panel Configuration
Section 4.1 evaluates two approaches to weight optimization for AnEn predictions. Results have shown that RB is significantly better and more reliable for a large geographic domain. In this section, predictor weights optimized by the RB method are used to generate AnEn forecasts. Weather ensembles are then used with 11 PV modules (Table 2) to simulate power production.
PV systems are simulated with a 10 KW capacity by the linear scaling of a particular type of panel in a series. For example, according to Table 2, panel SP128 has 400 W capacity. Therefore, the simulated system contains 25 such panels in series to achieve the desired power of 10 KW. It is possible to connect PV modules in parallel to account for shading, but it requires extra wiring. Currently, connecting modules in series remains the most common practice because it generates maximum power output while also being cheaper to install. In this PV system, we do not compensate for scaling and conversion losses, which depend on the type of inverter used.
Each PV system is run with hourly forecasts from AnEn for 2019. Since AnEn forecasts have 21 members, power simulation also has 21 members for each of the 11 systems.
Figure 10 gives an example of the simulated power ensemble from SP128. The figure shows the power production at solar noon on each day in 2019. Values are averaged across all CONUS grid points. Figures 10A–10B show a highly positive correlation. However, the GHI reaches its climax during July, while power generation enters a production plateau starting around early June. This is due to the saturation of the 10 KW PV system.
The power ensemble range has a noticeable decrease going from May into June. This decrease correlates with the change in cloud cover. The predictability is, therefore, impacted by the cloud cover condition. The pattern can also be observed by comparing the model analysis (the red line) to the ensemble median (the black dashed line). In March, when there is a significant amount of cloud cover, power simulation is less accurate, and model analysis is barely covered by the 50% range of the simulation ensemble. In July, on the other hand, when there is a low level of cloud cover, power simulation is more accurate, and the model analysis almost overlaps with the median of the simulation ensemble. This trend is also consistent for other months of the year.
Figure 11 compares the year-round power simulation error from 11 PV systems both in space and time. Figure 11A shows the total number of grid points where a particular PV system outperforms all other modules in terms of daily RMSE averaged across the CONUS; Figures 11B–11E show the geographic distribution of the outperforming panel based on seasonal RMSE.
Two types of modules stand out: KS20 and FS272. These modules have the capacity of 20 W (KS20) and 72.5 W (FS272), respectively, and they are among the smallest PV modules out of the 11 simulated modules. During the winter (Fig. 11B), FS272 is preferred on more than 85% of the grid points. In the Northern CONUS, FS272 is the preferred panel, while the larger panel, KD135GX, with a capacity of 135 W, is preferred in the Southeastern CONUS. SF160S, with a capacity of 160 W, is preferred in Western Texas and Southern New Mexico. Since there is significantly more cloud cover over the Northern CONUS during the winter, results suggest that the power production of a PV system with smaller module capacity but a larger module quantity can be better predicted.
During the spring, when solar irradiance starts to increase, larger panels begin to take dominance (Fig. 11B). However, there may not be a clear winner, because the total numbers of outperforming grids for the three panels, SF160S, KS20, and KD135GX, appear to be very similar (Fig. 11A). The region that previously prefers FS272 shifts to KD135GX with the capacity almost doubled from 72.5 W to 125 W. This suggests that the predictability of power production is related not only to weather conditions but also to how a particular PV module responds to the changing weather conditions.
During the summer (Fig. 11D), KS20 becomes the prevailing choice thanks to its relatively small size and small capacity; it is therefore easy to predict. Solar irradiance typically reaches a year-round climax during the summer. An exceedingly high level of irradiance leads to PV modules generating more power than the nominal capacity but also with a higher than usual cell temperature. Under this weather condition, simulations have shown that smaller panels are more predictable in terms of power production. Opposite to this trend, the Pacific Northwest tends to have persistent cloud clover and relatively low solar irradiance even during the summer. Therefore, a larger panel, KD135GX, is preferred because the working environment is not as radiant.
Finally, the fall shows a geographic transition from north to south favoring, in turns, the modules FS272, KD135GX, and KS20. This is the time of the year when the solar irradiance remains relatively high while the cloud cover slowly increases. A smaller panel is preferred in the north due to high cloud cover and in the south due to over-performance.
Figure 11 compares the 11 PV systems by showing outperformance, but it falls short at showing how different the power production is from the systems. Figure 12 demonstrates how the simulation error differs among the 11 PV systems.
Figure 12A shows the median of the simulation error from the 11 PV systems in a black solid line and the range of simulation error in orange. The simulation error is calculated for solar noon on each day in 2019 and then averaged within a particular ensemble. The range then shows the deviation of errors from 11 PV systems.
Figures 12B–12E show the seasonal maps of the simulation error range. For a particular grid point, the seasonal RMSE from the worst-performing module (already averaged over the 21 weather ensembles) is subtracted by the RMSE from the best performing module. This difference in error shows the potential improvement that can be expected when using the PV panel that is best at the location.
It is important to point out that the power simulation of a 10 KW system has a very fine granularity, meaning that power prediction for each PV module is linearly scaled after being simulated. The prediction error is, therefore, higher than what was shown in the previous literature (Alessandrini et al., 2015; Cervone et al., 2017; Abuella and Chowdhury, 2017; Zhang et al., 2019). Linear scaling is required because of the unavailability of dense, observed power records over the CONUS. Previous research, in fact, used observations from a limited sample of locations. An additional difference is that this research does not focus on predicting the output of a power plant, but rather on the production of power for individual modules, which are then scaled to the power output desired.
The RMSE has an increasing trend in the winter and a decreasing trend in the fall (Fig. 12A), which correlates well with the yearly pattern of solar irradiance. However, on average, the GHI is below 500 W/m2 for these two seasons (Fig. 10B), and PV panels are known not to produce high power during this period. Therefore, the predictability of the 11 PV systems is similar due to the relatively low power production, which is shown by the small range in Figure 12A and the mostly blue region in Figures 12B and 12E.
During spring and summer, the predictability of power systems starts to differ. The RMSE, on average, drops from 3 KW to 2.5 KW from the spring going into the summer (Fig. 12A) due to decreased cloud cover. But the range of the RMSE remains large compared to the fall and the winter. Persistent cloud cover and a high level of solar irradiance in the spring are likely to affect the predictability of power production. A larger predictability difference appears in the Southern and Midwestern U.S. (Fig. 12C), reaching around 420 W of RMSE difference, which is equivalent to 14% of the average RMSE and 4.2% of the total power generation.
During the summer, the situation is slightly different because cloud cover is typically low but solar irradiance is highest, reaching 870 W/m2. Without the impact of cloud cover, the overall seasonal difference of predictability (Fig. 12D) is lower than in the spring. Southern Texas and southern Florida are also shown to have large differences in predictability when comparing the results of the 11 PV systems, with a RMSE difference of over 300 W, which is equivalent to more than 12%.
The difference in the RMSE, or the range of the RMSE from various PV systems, is not caused by the difference in weather conditions, because all power simulations have been carried out using the same weather input generated by AnEn. Therefore, the weather conditions are held constant when comparing power production from the PV systems. The difference is, however, caused by how each module performs under different weather conditions. Modules have different efficiency levels and are made of different materials, and therefore they respond differently to weather changes. As a result, while improving weather forecasts helps to predict power production, the additional uncertainty introduced by simulating the modules also needs to be studied. As previously discussed, the characteristic of the modules can account for over 12% of the prediction error.
Table 5 summarizes the annual power generation and the simulation errors from the 11 simulated PV systems from the CONUS. The power generation (the Analysis and the Forecasted columns) is calculated as the sum of the hourly power production for the entire year of 2019 and then averaged across the CONUS grid points. The RMSE is the annual power production error averaged across the CONUS, and then it is normalized by the analysis field to show the percentage. If two panels have the same normalized RMSE, then they are sorted based on the RMSE.
The largest simulated module, SP128, has a power capacity of 400 W, and it is ranked #4 (bottom-up) out of the 11 modules. The smallest simulated module, KS20, has a power capacity of 20 W, and it is ranked #9 (bottom-up) out of the modules. Although a higher power production is usually associated with a larger prediction error at the hourly temporal scale, this relation can change when analyzed at a different temporal scale (for example, annual). On a yearly basis, SP128 is forecasted to generate 24.59 MW of power, 610 KW greater than that of KS20, yet the RMSE of SP128 still remains lower than that of KS20.
The last row of Table 5 shows the results when using the best module at each location rather than the same panel for the entire CONUS. The selection of modules is based on optimizing the annual predictability and minimizing the prediction error. As a result (Fig. 13), a majority of grid points prefer FS272 as the most predictable module, and this amounts to 31,108 grid points and 56.55% of all grid points. Following FS272 are SF160S (9524/16.77%), KD135GX (7436/13.10%), KS20 (2445/4.31%), STU300 (1728/3.04%), and ND216U1F (1421/2.50%). These panels account for more than 95% of the grid points within the CONUS, and in total, 10 modules were selected, with the exclusion of KC85T. The exclusion of KC85T is also expected from Figure 11A, in which KC85T is ranked last (bottom-up). There are few grid points throughout the year that prefer this particular module.
With the 10 module portfolio for the CONUS, the forecasted annual power generation is 24.80 MW per grid, and the RMSE is 699.82 KW. The prediction error is the lowest of all of the other single-module scenarios. Figure 13 shows the geographic distribution of the 10 modules in the composite scenario. FS272 is selected as the optimal module across the large area in the central to northern section of the CONUS. This region features a low–middle level solar irradiance and a middle level of total cloud cover. Since the power production is not particularly high compared to other regions of the CONUS, a smaller solar panel is preferred with 72.5 W capacity. While an even smaller module, KS20, is available, it was not chosen due to the difference in predictability. The sub-region that covers Texas and most of the Southeast features an exceedingly high level of solar irradiance and a fair amount of cloud cover. Modules tend to perform at a higher rate than the STC during summer. Under this condition, KD135GX, with a 135 W capacity, is selected for most regions. In Florida, where an even higher level of solar irradiance is found, it is impossible to identify a single panel that consistently outperforms the others. The final pattern shown in Figure 13 marks the scattered regions that favor SF160S. These regions are characterized by either low cloud cover or relatively high solar irradiance.
5. ENABLING SCALABLE SIMULATION VIA THE RADICAL ENSEMBLE TOOLKIT
RADICAL EnTK is a workflow-engine component of the RADICAL Cybertool (RCT) (Balasubramanian et al., 2019). This software system is designed and implemented in accordance with the building blocks approach. Each system is independently designed with well-defined entities, functionalities, states, events, and errors. Individual cybertools are designed to be consistent with a four-layered view of distributed systems for the execution of scientific workloads and workflows on high-performance computing (HPC) resources. Each layer has a well-defined functionality and an associated “entity.” The entities are workflows (or applications) at the top layer and resource-specific jobs at the bottom layer, with workloads and tasks as intervening transitional entities in the middle layers.
RCT has three main components: RADICAL-SAGA (RS) (Merzky et al., 2015), RADICAL-Pilot (RP) (Merzky et al., 2021, 2019), and RADICAL EnTK (Balasubramanian et al., 2018, 2016). RS is a Python implementation of the Open Grid Forum SAGA standard GFD.90 (Goodale et al., 2006), a high-level interface to distributed infrastructure components like job schedulers, file transfer, and resource provisioning services. RS enables interoperability across heterogeneous distributed infrastructures, improving their usability and enhancing the sustainability of services and tools. RP is a Python implementation of the pilot paradigm and architectural pattern (Turilli et al., 2018). Pilot systems enable users to submit pilot jobs to computing infrastructures and then use the resources acquired by the pilot to execute one or more tasks. These tasks are directly scheduled via the pilot and do not need to queue in the infrastructure’s batch system.
EnTK is a Python implementation of a workflow engine, which specializes in supporting the programming and execution of applications with ensembles of tasks. EnTK executes tasks concurrently or sequentially, depending on their arbitrary priority relation. Tasks are scalar, MPI, OpenMP, multi-process, and multi-threaded programs that run as self-contained executables. Tasks are not functions, methods, threads, or subprocesses.
5.1. Application Model
EnTK supports ensemble-based application (EBA), where an ensemble is defined as a set of tasks. Ensembles may vary in the number of tasks, types of tasks, tasks’ executable (or executing kernel), and tasks’ resource requirements. EBA may vary in the number of ensembles or the runtime dependencies among ensembles. The space of EBA is vast, and thus there is a need for simple and uniform abstractions while avoiding single-point solutions.
Currently, the use cases motivating EnTK require a maximum of O(104) tasks, but EnTK is designed to support up to O(106) tasks due to the rate at which this requirement has been increasing, especially in biomolecular simulations where a greater number of tasks is associated with greater sampling or more precise free-energy calculations. EnTK supports resubmission of failed tasks, without application checkpointing, and restarting of failed runtime system (RTS) and components. In this way, applications can be executed on multiple attempts without restarting completed tasks.
EnTK models EBA by combining the following user-facing constructs:
Task: An abstraction of a computational task that contains information regarding an executable, its software environment, and its data dependencies.
Stage: A set of tasks without mutual dependencies, which can be executed concurrently.
Pipeline: A list of stages where any stage can be executed only after stage − 1 has been executed.
The application consists of a set of pipelines, where each pipeline is a list of stages, and each stage is a set of tasks. All pipelines can execute concurrently, all stages of each pipeline can execute sequentially, and all tasks of each stage can execute concurrently. Pipeline, stage, and task (PST) descriptions can be extended to account for dependencies among groups of pipelines in terms of lists of sets of pipelines. Further, the specification of branches in the execution flow of applications does not require altering the PST semantics. Branching events can be specified as tasks where a decision is made about the runtime flow. For example, a task could be used to decide to skip some elements of a stage, based on some partial results of the ongoing computation.
5.2. Architecture
EnTK sits between the user and the HPC, abstracting resource management and execution management from the user. Figure 14 shows the components (purple) and subcomponents (green) of EnTK, which is organized in three layers: API, Workflow Management, and Workload Management.
The API layer enables users to codify PST descriptions. The Workflow Management layer retrieves information from the user about available infrastructures, initializes EnTK, and holds the global state of the application during execution. The Workload Management layer acquires resources via the RTS. The Workflow Management layer has two components: AppManager and WFProcessor. AppManager uses the Synchronizer subcomponent to update the state of the application at runtime. WFProcessor uses the Enqueue and Dequeue subcomponents to queue and dequeue tasks from the Workload Management layer. The Workload Management layer uses ExecManager and its Rmgr, Emgr, RTS Callback, and Heartbeat subcomponents to acquire resources from infrastructures and execute the application. Another benefit of this architecture is the isolation of the RTS into a stand-alone subsystem. This enables composability of EnTK with diverse RTS and, depending on capabilities, multiple types of infrastructures. Further, EnTK assumes the RTS to be a black box enabling fault-tolerance. When the RTS fails or becomes unresponsive, EnTK can tear it down and bring it back, losing only those tasks that were in execution at the time of the RTS failure.
5.3. Execution Model
EnTK components and subcomponents communicate and coordinate for the execution of tasks. Users describe an application via the API, instantiate the AppManager component with information about the available infrastructures, and then pass the application description to AppManager for execution. AppManager holds these descriptions and, upon initialization, creates all the queues, spawns the Synchronizer, and instantiates the WFProcessor and ExecManager. WFProcessor and ExecManager instantiate their own subcomponents.
Once EnTK is fully initialized, WFProcessor initiates the execution by creating a local copy of the application description from AppManager and tagging tasks for execution. Enqueue pushes these tasks to the Pending queue (① in Fig. 14). Emgr pulls tasks from the Pending queue ([② in Fig. 14) and executes them using a RTS (③ in Fig. 14). RTS Callback pushes tasks that have completed execution to the Done queue (④ in Fig. 14). Dequeue pulls completed tasks (⑤ in Fig. 14) and tags them as done, failed, or canceled, depending on the return code from the RTS.
Throughout the execution of the application, tasks, stages, and pipelines undergo multiple state transitions in both WFProcessor and ExecManager. Each component and subcomponent synchronizes these transitions with AppManager by pushing messages through dedicated queues (⑥ in Fig. 14). AppManager pulls these messages and updates the application states. AppManager then acknowledges the updates via dedicated queues (⑦ in Fig. 14). This messaging mechanism ensures that AppManager is always up-to-date with any state change, making it the only stateful component of EnTK.
5.4. Integration of Parallel Analog Ensemble
The integration of the PAnEn and the RADICAL EnTK provides good examples for the ensemble-of-pipelines model and the pipeline-of-ensembles model.
The ensemble-of-pipelines model refers to the workflow in which each pipeline is an independent execution unit that may consist of several stages, and there are several such pipelines to execute in parallel. This type of model is adopted for weight optimization as shown in Section 4.1. In total, there are 8001 unique weight combinations to experiment with, and each trial is independent of one another, meaning that the execution of one weight combination does not need to wait for or communicate with any other executions. Each trial consists of a pipeline following the process of generating weather analogs, simulation power production, and finally verifying power simulation results with the forecast analysis. The three stages should be executed in series, and therefore, they have been placed in a pipeline with an order. Since the processes need to be repeated for the NN and the RB, the final individual pipeline has six tasks in total.
The pipeline-of-ensembles model refers to the workflow in which there are multiple sequential stages, and each stage consists of tasks that will be executed in parallel. The later simulation of the 11 PV modules adopted this model in two stages:
(1) AnEn forecasts were generated for each of the subset domains in parallel.
(2) Each of the 11 modules was used to simulate power production for the corresponding subset domain.
Note that since the computation for each subset domain is considered to be an individual task, tasks can be executed in parallel. There are 71 subset domains for the NN and 28 for the RB. On the other hand, stages are executed in sequence because the power simulation can only be carried out after the AnEn forecasts have been successfully generated. EnTK offers a flexible interface for defining the computational requirement for each task so that each task can dynamically request a proper amount of allocation based on the area of the subset domain. Given that there are dozens of subsets with different areas, this functionality frees the developer from having to specify the computation manually.
In general, the pipeline-of-ensembles model offers more efficient concurrency because it utilizes lightweight tasks. As a result, adopting this type of model as much as possible is recommended. The ensemble-of-pipelines model merely provides another layer of abstraction for users to create more complex scientific workflows. Table 6 summarizes the runtime and computational requirement of the two experiments executed on the National Center for Atmospheric Research (NCAR) supercomputer, Cheyenne.
6. FINAL REMARKS
Forecast ensembles typically offer higher prediction accuracy than deterministic predictions, and they provide additional information on uncertainty. However, running an ensemble NWP model usually involves both technical and computational challenges. Researchers often download forecast ensembles from archives, but the amount of data accessed can easily exceed dozens of gigabytes and sometimes a terabyte. Designing a scientific workflow that involves simulation ensembles requires parallelization and scalable tools.
This chapter demonstrates the ability to use the AnEn method and the RADICAL EnTK tools to design ensemble workflows at scale. First, deterministic predictions over the CONUS for 2019 have been transformed into 21-member forecast ensembles using AnEn. The forecasts have a spatial resolution of 12 km and a temporal resolution of 1 h. Second, the weather forecast ensembles are used to simulate point-by-point power production with 11 different PV modules. Over 110 TB of simulation data have been generated on the NCAR supercomputer, Cheyenne.
Results have shown that the predictability of solar power generation depends not only on the meteorological variation but also on the configuration of PV modules. To optimize the amount of predictable solar power, the type of PV modules should be an important factor to consider in relation to the geographic locations and its interplay with long-term weather conditions.
ACKNOWLEDGMENTS
This work was partially supported by the National Science Foundation grant number 1639707.