Tutorial

This is a basic tutorial on how to use the PINN4GPR package.

The project has multiple parts:

  1. Creation of a randomized railway track GPR dataset via gprMax. Two methods are available to calculate sample B-scans:

    1. Using gprMax directly

    2. Using a pre-trained CNN-based surrogate model.

  2. Training of a CNN-based surrogate model for gprMax

  3. Training of physics-informed neural network (PINN) architectures on the generated data.

The code examples below assume you have installed all the necessary dependencies for the project. If not, go to Installation.

Dataset creation

The code responsible for the creation of a randomized railway track dataset is inside the src/dataset_creation folder. In particular, the main script is src/dataset_creation/create_dataset.py.

The execution of this script is divided into two parts: the creation of randomized input files, and running the gprMax simulations, including the postprocessing of the output files. Running it with no arguments will print a usage message:

python src/dataset_creation/create_dataset.py

An error will occur asking to specify a configuration file. All the configuration for dataset generation is stored in yaml files: two files are already included in the repository: ascan_dataset_config.yaml and bscan_dataset_config.yaml. It is possible to use these pre-configured files to generate respectively A-scan and B-scan datasets. The values from the configuration files are automatically parsed using pydantic. The most important configurations can be specified from the command line, which override the ones in the configuration file for the present run. More info on all the configuration keys is found in the configuration section.

Using gprMax simulations

This simplest way to generate a GPR dataset is by using gprMax for both the sample geometry generation and GPR data simulation. After having set a configuration file to your needs, all you need to do to generate a dataset is:

python src/dataset_creation/create_dataset.py config_file.yaml -ir

This will first create the gprMax input files (-i) and then run the simulations (-r).

Note

gprMax will create a different geometry view VTK file for each A-scan, so it is not recommended to generate them for B-scan datasets.

Warning

A problem related to the material used for steel sleepers and the PML formulation in gprMax causes EM waves to be reflected at the boundary of the simulations. This leads the simulation results to be completely wrong for samples where steel sleepers are present at the boundary of the simulation. Based on our experience, this happens in around half of the samples containing steel sleepers using the provided configuration files. A possible solution is the usage of the built-in pec (Perfect Electric Conductor) material for steel sleepers.

Using a pre-trained CNN model

It is possible to use a pre-trained CNN model to greatly improve the dataset generation speed when generating B-scans. The CNN model acts as a surrogate model for gprMax’s FDTD simulations. It takes as input the geometry maps generated by gprMax and outputs predictions for the associated B-scans.

To use this feature, first create a dataset with:

python src/dataset_creation/create_dataset.py config_file.yaml -ir --geometry_only

then, use the geom2bscan.py script to load the pre-trained model and use it to predict the B-scans:

python src/dataset_creation/geom2bscan.py predict -d path/to/dataset_output -m path/to/model.keras -o path/to/output_dir

The geom2bscan module accepts additional arguments to specify the GPU number, the median mask path (see the CNN surrogate model training section) and the in-memory batch size.

A concrete example of dataset generation is:

python src/dataset_creation/create_dataset.py bscan_dataset_config.yaml -ir --geometry_only

python src/dataset_creation/geom2bscan.py predict -d dataset_bscan/output \
   -m checkpoints/geom2bscan/model.keras -o dataset_bscan/predictions \
   --mask_path checkpoints/geom2bscan/median_mask.npy --mem_batch_size 10000

Note

The script accepts samples with larger width than the pre-trained model input size. In this case, a sliding window approach is used, where multiple predictions are fused together to generate the final prediction. The offset is half the size of the model input.

For this feature to work, the geometriey widths must be at least double the model input size and a integer multiple of the offset.

Dataset contents

The dataset is divided into input and output folder: respectively the input_dir and output_dir provided at dataset generation time.

The input folder will contain both the gprMax input files and some metadata files, in the metadata folder. These latter include all the sampled quantities and properties of each sample in the dataset and some plots showing the sampled distributions. A plaintext file with all this info is available for each sample, while the all_data.pkl file contains a pickled instance of the src.dataset_creation.statistics.DatasetStats class, with the metadata for the full dataset.

The output folder contains all the post-processed gprMax outputs. Depending on the dataset configuration, each folder can include:

  • the sample geometry map in numpy .npy format.

  • the resulting A or B-scan in HDF5 format, which can be loaded with gprMax tools package.

  • electric and magnetic field snapshots in numpy .npz format.

  • a geometry view file in the Visualization Toolkit format, which can be opened with Paraview.

Dataset configuration

All the configuration keys related to dataset generation are:

Key

Description

n_samples

The number of samples to generate. These are automatically named scan_0000, scan_0001 and so on.

n_ascans

The number of A-scans to create per sample.

seed

The random number generator seed used in dataset generation. The full dataset is deterministic based on this value.

generate_input

If set, generate input files in input_dir

run_simulations

If set, run the input files inside input_dir, including the ones just generated.

geometry_only

If set, only generate the geometries corresponding to the input files, but don’t run the simulations.

input_dir

The folder in which to store the generated input files and from which to read them when running simulations.

tmp_dir:

Temporary directory to store intermediate gprMax files before the postprocessing.

output_dir

Directory in which to store the final results.

track_configuration_probabilities

Set probabilities for each track type in the random sampling.

domain_size

Size of the sample in meters (in the x, y, z) directions.

spatial_resolution

gprMax spatial resolution in meters.

time_window

total duration of a simulation in seconds.

source_waveform

Name of the source waveform to use.

source_amplitude

Scaling factor for the amplitude of the source waveform.

source_central_frequency

Central frequency of the source signal.

source_position

Position of the source signal in meters.

receiver_position

Position of the receiver in meters

step_size

Movement of source and receiver between various A-scans belonging to the same B-scan.

fractal_dimension

Number representing the fractal dimension of Peplinski soils, between 0 and 3.

pep_soil_number

Number of materials composing a Peplinski soil mixture model.

materials

Properties of all the required materials in the simulation, including Peplinski mixture models.

antenna_sleeper_distance

Vertical distance between the source waveform and the top of the sleepers. Constant in each sample.

layer_sizes

Ranges for the size of all the layers in the simulation.

layer_roughness

Maximum randomly sampled vertical roughness (deviation) of the layers from their calculated size.

layer_sizes_beta_params

Beta distribution parameters for the layer size sampling.

sleepers_separation

Horizontal distance between two consecutive sleepers. Constant in each sample.

sleepers_material_probabilities

Set probabilities of each sleeper material in the random sampling.

sleepers_sizes:

Size of each sleeper given their material.

fouling_beta_params:

Beta distribution parameters for the fouling sampling.

fouling_box_threshold

Set threshold in the random sampling to add a fouling box behind the ballast stones.

general_water_content_beta_params

Beta distribution parameters for the general water content sampling.

water_infiltration_sampling_std

standard deviation of the gaussian distribution used for sampling if water infiltration occurs, with mean on the general water content.

water_infiltration_threshold

Set threshold in the random sampling to add water infiltrations between layers.

layer_water_sampling_std

standard deviation of the gaussian distribution used for sampling layer humidity, with mean on the general water content.

general_deterioration_beta_params

Beta distribution parameters for the general deterioration sampling of PSS and subsoil.

snapshot_times

times at which to generate snapshots of the electric and magnetic fields for each A-scan.

create_views

flag for geometry view files creation, which can be opened with Paraview. gprMax creates one view file per A-scan, so the flag is set to False for the B-scan dataset.

CNN surrogate model training

The geom2bscan.py module allows to train a CNN surrogate model for gprMax. This model is trained on a B-scan dataset including both geometry maps and B-scan outputs. This is done with the command:

python src/dataset_creation/geom2bscan.py train -d path/to/dataset_output  -o path/to/output_dir

Some additional parameters are accepted, including the batch size, training epochs and GPU number.

Among others, the script creates the following files:

  • model.keras: the model checkpoint

  • median_mask.npy: the median mask used to pre-process training labels. It must be provided during inference time to obtain accurate B-scan predictions.

Warning

The samples affected by the aforementioned PML problem are automatically removed from the training/test datasets. This means that, using the provided configuration files, a smaller number of samples with steel sleepers will be present in the dataset. These samples will also never show steel sleepers on the border of the domain.

PINN models training

The src/pinns folder contains code to train various PINN models on different geometries and conditions. Each experiment can be run by executing the corresponding file:

python src/pinns/experiment.py

More info on the setting of each experiment can be found in each source file.