geom2bscan module

This module contains code to:

Train a CNN-based black box neural network model to approximate the B-scan response from a given dataset.
Use a trained model to quickly compute B-scan predictions from a (possibly) geometry-only dataset.

src.dataset_creation.geom2bscan.build_network()[source]

Builds a geom2bscan keras model

Returns:

tf.keras.Model: the geom2bscan keras model

src.dataset_creation.geom2bscan.check_shapes_equal(dataset_output_path: str | Path, model) → bool[source]

Checks if the shapes of model input and dataset samples are equal.

Parameters:

dataset_output_pathstr | Path: dataset output folder path.
modelkeras.Model: model used for inference.

Returns:

bool: True if the model input and geometry shapes are equal, False otherwise.

src.dataset_creation.geom2bscan.filter_PML_bug_bscans(geometries: ndarray, bscans: ndarray, upper_limit: float = 1400000.0)[source]

Filters the samples of the dataset to remove the ones in which the PML bug appears. This is done using a threshold value on the sum of the absolute value of the pixels of B-scans. The reason is that the samples with the PML bug exibit much more reflections, so are well divided from the other ones.

Parameters:

geometriesnp.ndarray of shape [B, C, H, W]: sample geometries
bscansnp.ndarray of shape [B, H, W]: sample bscans
upper_limitfloat, optional: the upper limit threshold. A value of 1.4e6 has been empirically found to divide the dataset in a clean way.

Returns:

tuple[np.ndarray, np.ndarray]: the filtered geometries and bscans

src.dataset_creation.geom2bscan.filter_initial_wave(train_labels: ndarray, test_labels: ndarray)[source]

Filters the initial wave in the labels by removing the median of the values present in the pixels of the train labels.

Parameters:

train_labelsnp.ndarray of shape [B, H, W, C]: the train B-scan labels
test_labelsnp.ndarray of shape [B, H, W, C]: the test B-scan labels

Returns:

tuple[np.ndarray, np.ndarray, np.ndarray]: train labels, test labels, median image used for the filtering

src.dataset_creation.geom2bscan.load_dataset(dataset_output_path: str | Path = PosixPath('dataset_bscan/gprmax_output_files'), indexes_interval: tuple[int, int] | None = None, verbose=True)[source]

Loads the B-scan dataset at the specified location. Performs filtering of the PML bug related to steel sleepers.

Parameters:

dataset_output_pathstr | Path, optional: location of the output folder of the dataset, by default Path(“dataset_bscan/gprmax_output_files”)
indexes_intervaltuple[int, int] | None: If specified, the interval of indexes to load, upper limit excluded. Default : None.
verbosebool: Controls weather to print info and show a progress bar for the data loading. Default: True.

Returns:

tuple[np.ndarray, np.ndarray, list[str]]: data, labels, sample names

Predicts B-scans for the full dataset given and stores them as numpy arrays.

Parameters:

dataset_output_pathstr | Path: Path to the dataset output folder with samples to predict.
model_checkpoint_pathstr | Path: Path to the keras model checkpoint to use for prediction.
output_dirstr | Path: Directory in which the predictions will be stored.
label_mask_pathstr | Path | None: Path to the label median mask used for preprocessing labels during training. This will be added to the predictions to obtain the output B-scan. If None, no postprocessing is done.
memory_batch_sizeint | None: Size of the sample batches loaded into memory for each predictions cycle. If None, the full dataset is loaded into memory.

Returns:

np.ndarray | None: predictions, only if memory_batch_size is None.

src.dataset_creation.geom2bscan.predict_batch(geometries: ndarray, model, mask: ndarray | None)[source]

Predicts the B-scans for the specified geometries

Parameters:

geometriesnp.ndarray: Input geometries
modeltf.keras.Model: Model used for inference
masknp.ndarray | None: Mask added to the predictions of the model to calculate the B-scans, or None.

Returns:

np.ndarray: predictions

src.dataset_creation.geom2bscan.predict_batch_wide(geometries: ndarray, model, mask: ndarray | None)[source]

Splits wide geometries into multiple inputs for the model, then combines the predictions back into a single B-scan.

Uses a sliding window approach for predictions, with an offset of half image (192/2 = 96 pixels for the pretrained model).

Parameters:

geometriesnp.ndarray: input wide geometries.
modeltf.keras.Model: Model used for inference.
masknp.ndarray | None: label mask, or None.

Returns:

np.ndarray: predictions

src.dataset_creation.geom2bscan.preprocess_data(geoms: ndarray, bscans: ndarray)[source]

Preprocesses the data by applying a rescaling of 1/80 to the relative permittivity values and the cube root to both the conductivity and bscan values

Parameters:

geomsnp.ndarray of shape [B, 2, H, W]: sample geometries
bscansnp.ndarray of shape [B, H, W]: sample bscans

Returns:

tuple[np.ndarray, np.ndarray]: processed geometries and bscans

src.dataset_creation.geom2bscan.save_predictions(preds: ndarray, output_dir: str | Path, sample_names: list[str])[source]

Saves the predictions, each in its own file.

Parameters:

predsnp.ndarray: Predictions
output_dirstr | Path: Directory in which to save the files
sample_nameslist[str]: ordered list of names for the samples to save

src.dataset_creation.geom2bscan.split_dataset(geometries: ndarray, bscans: ndarray, random_state: int = 42)[source]

Splits the dataset into train and test set.

Parameters:

geometriesnp.ndarray: dataset geometries (input data)
bscansnp.ndarray: dataset bscans (labels)
random_stateint, optional: seed for the dataset split, by default 42

Returns:

tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: train data, test data, train labels, test labels.

src.dataset_creation.geom2bscan.train(dataset_output_path: str | Path, output_path: str | Path, epochs: int, batch_size: int)[source]

Trains a geom2bscan model with the (labelled) data in the dataset.

Parameters:

dataset_output_pathstr | Path: dataset output folder.
output_pathstr | Path: Directory in which all training results will be saved.
epochsint: number of training epochs.
batch_sizeint: training batch size.