Tutorial
This is a basic tutorial on how to use the PINN4GPR package.
The project has multiple parts:
Creation of a randomized railway track GPR dataset via gprMax. Two methods are available to calculate sample B-scans:
Using gprMax directly
Using a pre-trained CNN-based surrogate model.
Training of a CNN-based surrogate model for gprMax
Training of physics-informed neural network (PINN) architectures on the generated data.
The code examples below assume you have installed all the necessary dependencies for the project. If not, go to Installation.
Dataset creation
The code responsible for the creation of a randomized railway track dataset is inside the src/dataset_creation folder.
In particular, the main script is src/dataset_creation/create_dataset.py.
The execution of this script is divided into two parts: the creation of randomized input files, and running the gprMax simulations, including the postprocessing of the output files. Running it with no arguments will print a usage message:
python src/dataset_creation/create_dataset.py
An error will occur asking to specify a configuration file. All the configuration for dataset generation is stored in yaml files:
two files are already included in the repository: ascan_dataset_config.yaml and bscan_dataset_config.yaml. It is possible to use these
pre-configured files to generate respectively A-scan and B-scan datasets.
The values from the configuration files are automatically parsed using pydantic.
The most important configurations can be specified from the command line, which override the ones in the configuration file for the present run.
More info on all the configuration keys is found in the configuration section.
Using gprMax simulations
This simplest way to generate a GPR dataset is by using gprMax for both the sample geometry generation and GPR data simulation. After having set a configuration file to your needs, all you need to do to generate a dataset is:
python src/dataset_creation/create_dataset.py config_file.yaml -ir
This will first create the gprMax input files (-i) and then run the simulations (-r).
Note
gprMax will create a different geometry view VTK file for each A-scan, so it is not recommended to generate them for B-scan datasets.
Warning
A problem related to the material used for steel sleepers and the PML formulation in gprMax causes EM waves to be reflected
at the boundary of the simulations. This leads the simulation results to be completely wrong for samples where steel
sleepers are present at the boundary of the simulation. Based on our experience, this happens in around half of the samples
containing steel sleepers using the provided configuration files. A possible solution is the usage of the built-in pec
(Perfect Electric Conductor) material for steel sleepers.
Using a pre-trained CNN model
It is possible to use a pre-trained CNN model to greatly improve the dataset generation speed when generating B-scans. The CNN model acts as a surrogate model for gprMax’s FDTD simulations. It takes as input the geometry maps generated by gprMax and outputs predictions for the associated B-scans.
To use this feature, first create a dataset with:
python src/dataset_creation/create_dataset.py config_file.yaml -ir --geometry_only
then, use the geom2bscan.py script to load the pre-trained model and use it to predict the B-scans:
python src/dataset_creation/geom2bscan.py predict -d path/to/dataset_output -m path/to/model.keras -o path/to/output_dir
The geom2bscan module accepts additional arguments to specify the GPU number, the median mask path
(see the CNN surrogate model training section) and the in-memory batch size.
A concrete example of dataset generation is:
python src/dataset_creation/create_dataset.py bscan_dataset_config.yaml -ir --geometry_only
python src/dataset_creation/geom2bscan.py predict -d dataset_bscan/output \
-m checkpoints/geom2bscan/model.keras -o dataset_bscan/predictions \
--mask_path checkpoints/geom2bscan/median_mask.npy --mem_batch_size 10000
Note
The script accepts samples with larger width than the pre-trained model input size. In this case, a sliding window approach is used, where multiple predictions are fused together to generate the final prediction. The offset is half the size of the model input.
For this feature to work, the geometriey widths must be at least double the model input size and a integer multiple of the offset.
Dataset contents
The dataset is divided into input and output folder: respectively the
input_dir and output_dir provided at dataset generation time.
The input folder will contain both the gprMax input files and some metadata files, in the metadata folder.
These latter include all the sampled quantities and properties of each sample in the dataset
and some plots showing the sampled distributions. A plaintext file with all this info is available for each sample,
while the all_data.pkl file contains a pickled instance of the src.dataset_creation.statistics.DatasetStats class,
with the metadata for the full dataset.
The output folder contains all the post-processed gprMax outputs. Depending on the dataset configuration, each folder can include:
the sample geometry map in numpy
.npyformat.the resulting A or B-scan in HDF5 format, which can be loaded with gprMax tools package.
electric and magnetic field snapshots in numpy
.npzformat.a geometry view file in the Visualization Toolkit format, which can be opened with Paraview.
Dataset configuration
All the configuration keys related to dataset generation are:
Key |
Description |
|---|---|
n_samples |
The number of samples to generate. These are automatically named |
n_ascans |
The number of A-scans to create per sample. |
seed |
The random number generator seed used in dataset generation. The full dataset is deterministic based on this value. |
generate_input |
If set, generate input files in |
run_simulations |
If set, run the input files inside |
geometry_only |
If set, only generate the geometries corresponding to the input files, but don’t run the simulations. |
input_dir |
The folder in which to store the generated input files and from which to read them when running simulations. |
tmp_dir: |
Temporary directory to store intermediate gprMax files before the postprocessing. |
output_dir |
Directory in which to store the final results. |
track_configuration_probabilities |
Set probabilities for each track type in the random sampling. |
domain_size |
Size of the sample in meters (in the x, y, z) directions. |
spatial_resolution |
gprMax spatial resolution in meters. |
time_window |
total duration of a simulation in seconds. |
source_waveform |
Name of the source waveform to use. |
source_amplitude |
Scaling factor for the amplitude of the source waveform. |
source_central_frequency |
Central frequency of the source signal. |
source_position |
Position of the source signal in meters. |
receiver_position |
Position of the receiver in meters |
step_size |
Movement of source and receiver between various A-scans belonging to the same B-scan. |
fractal_dimension |
Number representing the fractal dimension of Peplinski soils, between 0 and 3. |
pep_soil_number |
Number of materials composing a Peplinski soil mixture model. |
materials |
Properties of all the required materials in the simulation, including Peplinski mixture models. |
antenna_sleeper_distance |
Vertical distance between the source waveform and the top of the sleepers. Constant in each sample. |
layer_sizes |
Ranges for the size of all the layers in the simulation. |
layer_roughness |
Maximum randomly sampled vertical roughness (deviation) of the layers from their calculated size. |
layer_sizes_beta_params |
Beta distribution parameters for the layer size sampling. |
sleepers_separation |
Horizontal distance between two consecutive sleepers. Constant in each sample. |
sleepers_material_probabilities |
Set probabilities of each sleeper material in the random sampling. |
sleepers_sizes: |
Size of each sleeper given their material. |
fouling_beta_params: |
Beta distribution parameters for the fouling sampling. |
fouling_box_threshold |
Set threshold in the random sampling to add a fouling box behind the ballast stones. |
general_water_content_beta_params |
Beta distribution parameters for the general water content sampling. |
water_infiltration_sampling_std |
standard deviation of the gaussian distribution used for sampling if water infiltration occurs, with mean on the general water content. |
water_infiltration_threshold |
Set threshold in the random sampling to add water infiltrations between layers. |
layer_water_sampling_std |
standard deviation of the gaussian distribution used for sampling layer humidity, with mean on the general water content. |
general_deterioration_beta_params |
Beta distribution parameters for the general deterioration sampling of PSS and subsoil. |
snapshot_times |
times at which to generate snapshots of the electric and magnetic fields for each A-scan. |
create_views |
flag for geometry view files creation, which can be opened with Paraview. gprMax creates one view file per A-scan, so the flag is set to False for the B-scan dataset. |
CNN surrogate model training
The geom2bscan.py module allows to train a CNN surrogate model for gprMax. This model is trained on a B-scan dataset
including both geometry maps and B-scan outputs. This is done with the command:
python src/dataset_creation/geom2bscan.py train -d path/to/dataset_output -o path/to/output_dir
Some additional parameters are accepted, including the batch size, training epochs and GPU number.
Among others, the script creates the following files:
model.keras: the model checkpointmedian_mask.npy: the median mask used to pre-process training labels. It must be provided during inference time to obtain accurate B-scan predictions.
Warning
The samples affected by the aforementioned PML problem are automatically removed from the training/test datasets. This means that, using the provided configuration files, a smaller number of samples with steel sleepers will be present in the dataset. These samples will also never show steel sleepers on the border of the domain.
PINN models training
The src/pinns folder contains code to train various PINN models on different geometries and conditions.
Each experiment can be run by executing the corresponding file:
python src/pinns/experiment.py
More info on the setting of each experiment can be found in each source file.