cellphe.segmentation package

Submodules

cellphe.segmentation.seg_errors module

cellphe.segmentation.seg_errors

Functions relating to handling segmentation errors.

cellphe.segmentation.seg_errors.balance_training_set(x: DataFrame, y: array) → DataFrame[source]

Balances a training set when one class might be underrepresented. Uses the SMOTE algorithm.

Parameters:

x – A dataframe with one or more feature columns.
y – An array containing the class labels. Must have the same length as rows in x. Labels can either be integer or string.

Returns:

A dataframe with the same columns as the input, but with the number of rows now equal to 2 times the number of samples from the majority class. I.e. if df had 5 ‘negative’ classes and 10 ‘positive’ classes, then the output will have 20 rows as it will have oversampled 5 negative observations.

cellphe.segmentation.seg_errors.predict_segmentation_errors(errors: DataFrame, clean: DataFrame, testset: DataFrame, num: int = 5, proportion: float = 0.7, num_repeats: int = 1) → array[source]

Predicts whether or not cells have experienced segmentation errors through the use of decision trees fitted on labelled training data.

num decision trees are trained on the labelled training data (errors are features from incorrectly segmented cells and clean are features from known correctly segemented cells). They then predict whether the cells provided in the testset are segmented correctly or not. If proportion of the num trees vote for a segmentation error, then that cell is predicted to contain an error.

Optionally, this behaviour can be repeated num_repeats times with the final outcome the result of a majority vote. I.e. if num_repeats = 3, then 2 of the repeats must vote for an error. If num_repeats = 4 then 3 votes are required. This behaviour is contained in a separate function in the R package, predictSegErrors_Ensemble.

Parameters:

errors – DataFrame containing the 1111 frame-level features from a set of cells known to be incorrectly segmented (having removed the CellID column).
clean – DataFrame containing the 1111 frame-level features from a set of cells known to be correctly segmented (having removed the CellID column).
testset – DataFrame containing the 1111 frame-level features from the cells to be assesed (having removed the CellID column).
num – Numbe of decision trees to fit.
proportion – Proportion of decision trees needed for a segmentation error vote to be successful.
num_repeats – The number of times to run the classification, with the final outcome coming from a majority vote.

Returns:

Returns a Numpy boolean array the same length as there are rows in testset, indicating whether the associated Cell contains a segmentation error or not.

cellphe.segmentation.seg_errors.remove_predicted_seg_errors(dataset: DataFrame, cellid_label: str, error_cells: list[int]) → DataFrame[source]

Remove predicted segmentation errors from a data set.

This function can be used to automate removal of predicted segmentation errors from the test set.

Parameters:

dataset – Test set for segmentation error predictions to be made
cellid_label – Label for the column of cell identifiers within the test set.
error_cells – Output from either predictSegErrors() or predictSegErrors_Ensemble(), a list of cell identifiers for cells classified as segmentation error

Returns:

A dataframe with the predicted errors removed.

Module contents

Functions related to predicting and removing segmentation errors.