cellphe.features package

Submodules

cellphe.features.frame module

cellphe.features.frame

Functions for extracting frame-level features.

cellphe.features.frame.calculate_density(df: DataFrame, radius_threshold: float = 6) array[source]

Calculates cellular density at each frame.

In particular, this is the total inverse distance from a cell in a frame to every other cell.

Parameters:

df – DataFrame with columns: - CellID - FrameID - x - y - Rad

Returns:

A DataFrame with 3 columns: - CellID - FrameID - dens (the calculated density)

cellphe.features.frame.cell_features(df: DataFrame, roi_archive: str, frame_folder: str, framerate: float, minimum_cell_size: int = 8) DataFrame[source]

Calculates cell features from timelapse videos

Calculates 74 features related to size, shape, texture and movement for each cell on every non-missing frame, as well as the cell density around each cell on each frame. NB: while the ROI filenames are expected to be provided in df and found in roi_archive, the frame filenames are just expected to follow the naming convention <some text>-<FrameID>.tiff, where FrameID is a 4 digit leading zero-padded number, corresponding to the FrameID column in df.

Parameters:
  • df – DataFrame where every row corresponds to a combination of a cell tracked in a frame. It must have at least columns CellID, FrameID and ROI_filename along with any additional features.

  • roi_archive – A path to a Zip containing multiple Report Object Instance (ROI) files named in the format cellid-frameid.roi

  • frame_folder – A path to a directory containing multiple frames in TIFF format. It is assumed these are named under the pattern <experiment name>-<frameid>.tif, where <frameid> is a 4 digit zero-padded integer.

  • framerate – The frame-rate, used to provide a meaningful measurement unit for velocity, otherwise a scaleless unit is implied with framerate=1.

  • minimum_cell_size – Minimum height and width of the cell in pixels.

Returns:

A dataframe with 77+N columns (where N is the number of imported features) and 1 row per cell per frame it’s present in: * FrameID: the numeric frameID * CellID: the numeric cellID * ROI_filename: the ROI filename * ...: 74 frame specific features * ...: Any other data columns that were present in df

cellphe.features.frame.cooccurrence_matrix(image1: array, image2: array, mask: array, levels: int) array[source]

Calculate cooccurrence matrix between 2 images downscaled to a certain level.

Parameters:
  • image1 – The first image as a 2D numpy array.

  • image2 – The second image as a 2D numpy array.

  • mask – A boolean mask with the same dimensions as image1 and image2

  • levels – Number of grayscale levels to downscale to.

Returns:

Returns a levels x levels matrix of the cooccurrences of each level between the 2 images.

cellphe.features.frame.curvature(boundaries: array, gap: int) float[source]

Identifies the curvature of a boundary condition.

Parameters:
  • boundaries – A 2D array of [[x1, y1], [x2, y2], …, [xn, yn]] pairs.

  • gap – The gap.

Returns:

The curvature as a float.

cellphe.features.frame.double_image(image: array) array[source]

Doubles the size of an image.

Parameters:

image – 2D numpy array representing the downscaled image with dimensions m x n.

Returns:

A 2D numpy array with dimensions 2m x 2n

cellphe.features.frame.extract_static_features(image: array, roi: array) array[source]

Extracts the 68 frame-level static (i.e. no movement based) features for a given image and roi.

Parameters:
  • image – The image as a 2D Numpy array.

  • roi – The region of interest as an Mx2 Numpy array.

Returns:

A 1D array of length 68 containing the features.

cellphe.features.frame.get_frame_id_from_filename(fn: str) int | None[source]

Retrieves the FrameID from a given filename.

Parameters:

fn – The filename.

Returns:

An integer giving the frameID, or None if not found.

cellphe.features.frame.haar_approximation_2d(image: array) array[source]

Calculates the approximation coefficients of a 2D db1 (aka Haar) wavelet transform.

Parameters:

image – 2D numpy array containing the image pixels.

Returns:

A 2D numpy array containing the approximation coefficients.

cellphe.features.frame.haralick(cooc: array) array[source]

Calculates Haralick features from the given cooccurrence matrix.

Parameters:

cooc – Cooccurrence matrix.

Returns:

A Numpy array of size 14 corresponding to each of the features.

cellphe.features.frame.intensity_quantiles(pixels: array) array[source]

Calculates the coefficient of variation in distance between pixels at different quantiles of intensity.

Parameters:

pixels – A 2D array with 3 columns corresponding to x, y, and intensity.

Returns:

A 1D array with length 9, corresponding to the coefficient of variation between pixel distances at different quantile thresholds (0.1-0.9).

cellphe.features.frame.minimum_box(boundaries: array) array[source]

Finds the minimum box around some boundary coordinates.

Parameters:

boundaries – A 2D array of [[x1, y1], [x2, y2], …, [xn, yn]] pairs.

Returns:

A 1D numpy array of an [x,y] pair.

cellphe.features.frame.polygon(boundaries: array) array[source]

Calculates the minimal polygon around a set of points using the Ramer-Douglas-Peucker method. Uses the shapely implementation.

Parameters:

boundaries – A 2D array of [[x1, y1], [x2, y2], …, [xn, yn]] pairs.

Returns:

A 2D array comprising the minimal set of points.

cellphe.features.frame.polygon_angle(points: array) array[source]

Calculate interior angles from a polygon.

Parameters:

points – An N x 3 matrix.

Returns:

A 1D array of length N, each entry representing an angle.

cellphe.features.frame.polygon_features(boundaries: array) array[source]

Derives features from the minimal polygon surrounding the boundary coordinates.

Parameters:

boundaries – A 2D array of [[x1, y1], [x2, y2], …, [xn, yn]] pairs.

Returns:

A 1D array with 4 values: -[0] The longest edge -[1] The smallest interior angle -[2] The variance of the interior angles -[3] The variance of the edges

cellphe.features.frame.var_from_centre(boundaries: array) list[float][source]

Determines the distance of boundary conditions from the centre.

Parameters:

boundaries – A 2D array of [[x1, y1], [x2, y2], …, [xn, yn]] pairs.

Returns:

A tuple of the mean distance from the centre and the variance.

cellphe.features.helpers module

cellphe.features.helpers

Helper functions for use in both frame and time-series level feature calculations.

cellphe.features.helpers.skewness(x: array) float[source]

Calculates the skewness of a sample.

Uses the type 2method in the R e1071::skewness implementation, which is the version used in SAS and SPSS according to the documentation.

Parameters:

x – Sample.

Returns:

A float representing the skewness.

cellphe.features.time_series module

cellphe.features.time_series

Functions for extracting features from time-series.

cellphe.features.time_series.ascent(x: array, diff: bool = True) float[source]

Calculates the ascent of a signal.

This is defined as the sum of the point-to-point positive differences, divided by the total length of the signal.

Parameters:
  • x – Input array.

  • diff – Whether to take the difference first (required for elevation variables but not those from wavelets).

Returns:

A float representing the ascent.

cellphe.features.time_series.calculate_trajectory_area(df) float[source]

Calculates the trajectory area of a cell.

Parameters:
  • xs – An array of x-coordinates.

  • ys – An array of y-coordinates.

Returns:

The trajectory area as a float.

cellphe.features.time_series.descent(x: array, diff: bool = True) float[source]

Calculates the descent of a signal.

This is defined as the sum of the point-to-point negative differences, divided by the total length of the signal.

Parameters:
  • x – Input array.

  • diff – Whether to take the difference first (required for elevation variables but not those from wavelets).

Returns:

A float representing the descent.

cellphe.features.time_series.haar_approximation_1d(x: pd.Series)[source]

Haar wavelet approximation for a 1D signal with 3 levels of decomposition.

Parameters:

x – The input signal.

Returns:

A list of length 3 for each level, with each entry containing the detail coefficients.

cellphe.features.time_series.interpolate(df: DataFrame) DataFrame[source]

Linearly interpolates a dataframe with missing frames.

The resultant dataframe will have more rows than the input if at minimum one cell is missing from one frame. All feature columns will be linearly interpolated during these missing frames.

Parameters:

df – The DataFrame with columns CellID, FrameID and any other feature columns.

Returns:

A DataFrame with the same column structure as df, but with either the same number of rows or greater.

cellphe.features.time_series.skewness_positive(x: array) float[source]

Calculates the skewness of an array.

If the array doesn’t have at least one positive value then it returns 0.

Parameters:

x – Input array.

Returns:

Either the skewness or 0, depending if the array has no positive values.

cellphe.features.time_series.time_series_features(df: DataFrame) DataFrame[source]

Calculates 15 time-series based features for each frame-level feature.

Parameters:

df – A DataFrame as output from extract.features.

Returns:

A DataFrame with CellID then 15*F+1 columns, where F is the number of feature columns in df.

cellphe.features.time_series.wavelet_features(x: Series) DataFrame[source]

Calculates the elevation metrics for the detail coefficients from 3 levels of a Haar wavelet approximation.

Parameters:

x – The raw data as an array.

Returns:

A 1-row DataFrame comprising 9 columns, one for each of the 3 elevation metrics for each of the 3 Wavelet levels.

Module contents

cellfeature.features

Provides functions relating to extracting features.