Evaluation of long-term evolution, safety and performance of nuclear waste disposal systems heavily relies on physically-based numerical simulation (e.g., by finite-element or lattice-Boltzmann codes) of the relevant water flow, transport and geochemical processes (e.g., reactive transport of radionuclides) that take place through and within different porous systems (i.e., barriers within a repository, e.g, concrete walls, geologic layer, earth cover). During the last decades, and driven by increased scientific knowledge and computer power, the numerical representation of the physico-chemical processes playing in these disposal systems has become increasingly more complex. Thus, both the corresponding simulation domain (e.g, 1D, 2D or 3D finite-element grid) and simulation period can be very large while the modeled processes can be strongly nonlinear (e.g., variably-saturated flow or geochemical reactions). Also, flow and reactive transport models can be made even more complex with the inclusion of thermo-mechanical processes. Altogether, these aspects can make evolution and performance studies computationally expensive. In particular, these long simulation times hamper important tasks that involve repeated model runs such as (1) scenario-based simulation analysis, (2) Monte-Carlo based uncertainty assessment of the model predictions, (3) sensitivity analysis of the model parameters, (4) calibration of the model parameters (aka inverse modeling).
This PhD will therefore be focused on developing new surrogate modeling (also called statistical emulation or data-driven modeling) approaches that enable the simulation of the aforementioned complex processes (mainly flow and reactive transport) with much less computational demand. The driving idea to is devote the available computational budget (typically from 100 to a few 1000s model runs) to the construction of a computationally cheap statistical emulator, either of the most CPU-intensive component of the original model (e.g, geochemical solver of a reactive transport simulator), or of the full original model (e.g., high-dimensional transient flow simulator). The tasks involving repeated simulation runs with different parameter values can then be performed using the obtained emulator at rather small computational cost. Some of the associated open challenges are related to the small amount of available model runs, the model nonlinearities, the potentially high dimensionality of the model parameter and output spaces and the treatment of the surrogate bias. To address these problems, several machine learning algorithms for nonlinear regression (including recent deep learning developments) will be investigated and some will be further developed.