Tutorial

This is a basic tutorial on how to use the PINN4GPR package.

The project has multiple parts:

Creation of a randomized railway track GPR dataset via gprMax. Two methods are available to calculate sample B-scans:
1. Using gprMax directly
2. Using a pre-trained CNN-based surrogate model.
Training of a CNN-based surrogate model for gprMax
Training of physics-informed neural network (PINN) architectures on the generated data.

The code examples below assume you have installed all the necessary dependencies for the project. If not, go to Installation.

Dataset creation

The code responsible for the creation of a randomized railway track dataset is inside the src/dataset_creation folder. In particular, the main script is src/dataset_creation/create_dataset.py.

The execution of this script is divided into two parts: the creation of randomized input files, and running the gprMax simulations, including the postprocessing of the output files. Running it with no arguments will print a usage message:

python src/dataset_creation/create_dataset.py

An error will occur asking to specify a configuration file. All the configuration for dataset generation is stored in yaml files: two files are already included in the repository: ascan_dataset_config.yaml and bscan_dataset_config.yaml. It is possible to use these pre-configured files to generate respectively A-scan and B-scan datasets. The values from the configuration files are automatically parsed using pydantic. The most important configurations can be specified from the command line, which override the ones in the configuration file for the present run. More info on all the configuration keys is found in the configuration section.

Using gprMax simulations

This simplest way to generate a GPR dataset is by using gprMax for both the sample geometry generation and GPR data simulation. After having set a configuration file to your needs, all you need to do to generate a dataset is:

python src/dataset_creation/create_dataset.py config_file.yaml -ir

This will first create the gprMax input files (-i) and then run the simulations (-r).

Note

gprMax will create a different geometry view VTK file for each A-scan, so it is not recommended to generate them for B-scan datasets.

Warning

A problem related to the material used for steel sleepers and the PML formulation in gprMax causes EM waves to be reflected at the boundary of the simulations. This leads the simulation results to be completely wrong for samples where steel sleepers are present at the boundary of the simulation. Based on our experience, this happens in around half of the samples containing steel sleepers using the provided configuration files. A possible solution is the usage of the built-in pec (Perfect Electric Conductor) material for steel sleepers.

Using a pre-trained CNN model

It is possible to use a pre-trained CNN model to greatly improve the dataset generation speed when generating B-scans. The CNN model acts as a surrogate model for gprMax’s FDTD simulations. It takes as input the geometry maps generated by gprMax and outputs predictions for the associated B-scans.

To use this feature, first create a dataset with:

python src/dataset_creation/create_dataset.py config_file.yaml -ir --geometry_only

then, use the geom2bscan.py script to load the pre-trained model and use it to predict the B-scans:

python src/dataset_creation/geom2bscan.py predict -d path/to/dataset_output -m path/to/model.keras -o path/to/output_dir

The geom2bscan module accepts additional arguments to specify the GPU number, the median mask path (see the CNN surrogate model training section) and the in-memory batch size.

A concrete example of dataset generation is:

python src/dataset_creation/create_dataset.py bscan_dataset_config.yaml -ir --geometry_only

python src/dataset_creation/geom2bscan.py predict -d dataset_bscan/output \
   -m checkpoints/geom2bscan/model.keras -o dataset_bscan/predictions \
   --mask_path checkpoints/geom2bscan/median_mask.npy --mem_batch_size 10000

Note

The script accepts samples with larger width than the pre-trained model input size. In this case, a sliding window approach is used, where multiple predictions are fused together to generate the final prediction. The offset is half the size of the model input.

For this feature to work, the geometriey widths must be at least double the model input size and a integer multiple of the offset.

Dataset contents

The dataset is divided into input and output folder: respectively the input_dir and output_dir provided at dataset generation time.

The input folder will contain both the gprMax input files and some metadata files, in the metadata folder. These latter include all the sampled quantities and properties of each sample in the dataset and some plots showing the sampled distributions. A plaintext file with all this info is available for each sample, while the all_data.pkl file contains a pickled instance of the src.dataset_creation.statistics.DatasetStats class, with the metadata for the full dataset.

The output folder contains all the post-processed gprMax outputs. Depending on the dataset configuration, each folder can include:

the sample geometry map in numpy .npy format.
the resulting A or B-scan in HDF5 format, which can be loaded with gprMax tools package.
electric and magnetic field snapshots in numpy .npz format.
a geometry view file in the Visualization Toolkit format, which can be opened with Paraview.

Dataset configuration

All the configuration keys related to dataset generation are:

Key	Description
n_samples	The number of samples to generate. These are automatically named `scan_0000`, `scan_0001` and so on.
n_ascans	The number of A-scans to create per sample.
seed	The random number generator seed used in dataset generation. The full dataset is deterministic based on this value.
generate_input	If set, generate input files in `input_dir`
run_simulations	If set, run the input files inside `input_dir`, including the ones just generated.
geometry_only	If set, only generate the geometries corresponding to the input files, but don’t run the simulations.
input_dir	The folder in which to store the generated input files and from which to read them when running simulations.
tmp_dir:	Temporary directory to store intermediate gprMax files before the postprocessing.
output_dir	Directory in which to store the final results.
track_configuration_probabilities	Set probabilities for each track type in the random sampling.
domain_size	Size of the sample in meters (in the x, y, z) directions.
spatial_resolution	gprMax spatial resolution in meters.
time_window	total duration of a simulation in seconds.
source_waveform	Name of the source waveform to use.
source_amplitude	Scaling factor for the amplitude of the source waveform.
source_central_frequency	Central frequency of the source signal.
source_position	Position of the source signal in meters.
receiver_position	Position of the receiver in meters
step_size	Movement of source and receiver between various A-scans belonging to the same B-scan.
fractal_dimension	Number representing the fractal dimension of Peplinski soils, between 0 and 3.
pep_soil_number	Number of materials composing a Peplinski soil mixture model.
materials	Properties of all the required materials in the simulation, including Peplinski mixture models.
antenna_sleeper_distance	Vertical distance between the source waveform and the top of the sleepers. Constant in each sample.
layer_sizes	Ranges for the size of all the layers in the simulation.
layer_roughness	Maximum randomly sampled vertical roughness (deviation) of the layers from their calculated size.
layer_sizes_beta_params	Beta distribution parameters for the layer size sampling.
sleepers_separation	Horizontal distance between two consecutive sleepers. Constant in each sample.
sleepers_material_probabilities	Set probabilities of each sleeper material in the random sampling.
sleepers_sizes:	Size of each sleeper given their material.
fouling_beta_params:	Beta distribution parameters for the fouling sampling.
fouling_box_threshold	Set threshold in the random sampling to add a fouling box behind the ballast stones.
general_water_content_beta_params	Beta distribution parameters for the general water content sampling.
water_infiltration_sampling_std	standard deviation of the gaussian distribution used for sampling if water infiltration occurs, with mean on the general water content.
water_infiltration_threshold	Set threshold in the random sampling to add water infiltrations between layers.
layer_water_sampling_std	standard deviation of the gaussian distribution used for sampling layer humidity, with mean on the general water content.
general_deterioration_beta_params	Beta distribution parameters for the general deterioration sampling of PSS and subsoil.
snapshot_times	times at which to generate snapshots of the electric and magnetic fields for each A-scan.
create_views	flag for geometry view files creation, which can be opened with Paraview. gprMax creates one view file per A-scan, so the flag is set to False for the B-scan dataset.

CNN surrogate model training

The geom2bscan.py module allows to train a CNN surrogate model for gprMax. This model is trained on a B-scan dataset including both geometry maps and B-scan outputs. This is done with the command:

python src/dataset_creation/geom2bscan.py train -d path/to/dataset_output  -o path/to/output_dir

Some additional parameters are accepted, including the batch size, training epochs and GPU number.

Among others, the script creates the following files:

model.keras: the model checkpoint
median_mask.npy: the median mask used to pre-process training labels. It must be provided during inference time to obtain accurate B-scan predictions.

Warning

The samples affected by the aforementioned PML problem are automatically removed from the training/test datasets. This means that, using the provided configuration files, a smaller number of samples with steel sleepers will be present in the dataset. These samples will also never show steel sleepers on the border of the domain.

PINN models training

The src/pinns folder contains code to train various PINN models on different geometries and conditions. Each experiment can be run by executing the corresponding file:

python src/pinns/experiment.py

More info on the setting of each experiment can be found in each source file.