flagx.gating

This module provides classification methods for automated flow cytometry gating.

SomClassifier

MLPClassifier

class flagx.gating.MLPClassifier(*args: Any, **kwargs: Any)

Bases: BaseEstimator, ClassifierMixin

A three layer perceptron (MLP) classifier.

This classifier wraps a fully connected neural network implemented in PyTorch while exposing a scikit-learn–style API. The model supports multi-class classification, automatic device selection (CPU or GPU), and provides the methods fit(), predict(), predict_proba(), score(), save(), and load(). For model training CrossEntropyLoss and the Adam optimizer are used.

classes_

Class labels after re-indexing to integers starting from 0.

Type:: np.ndarray or None

class_counts_

Class counts from the training data.

Type:: np.ndarray or None

og_classes_

Original class labels before re-indexing.

Type:: np.ndarray or None

class_priors_

Empirical class priors.

Type:: np.ndarray or None

new_to_og_classes_dict_

Mapping from new integer labels back to original labels.

Type:: dict[int, Any] or None

data_set_

PyTorch tensor dataset constructed during fitting.

Type:: TensorDataset or None

data_loader_

PyTorch DataLoader used for minibatch training.

Type:: DataLoader or None

model_

Neural network model.

Type:: nn.Module or None

criterion_

Loss function, PyTorch CrossEntropyLoss.

Type:: nn.Module or None

optimizer_

PyTorch Adam optimizer with learning rate 0.001.

Type:: Optimizer or None

training_log_

Logged losses of the training run.

Type:: dict[str, int | list[int | float]] or None

is_fitted_

Whether the classifier has been fitted.

Type:: bool

Parameters:

layer_sizes (Tuple[int, ...]) – Sizes of the hidden layers in the fully connected neural network.
n_epochs (int) – Number of training epochs.
loss_params (dict[str, Any] or None) – Parameters passed to the PyTorch’s CrossEntropyLoss.
optimizer_params (dict[str, Any] or None) – Parameters passed to the PyTorch’s Adam optimizer. If None, defaults to {'lr': 0.001}.
data_loader_params (dict[str, Any] or None) – Parameters passed to the PyTorch DataLoader. If None, defaults to {'batch_size': 128, 'shuffle': True, 'num_workers': 1}.
validation_fraction (float) – Fraction of the training data used as validation set. Defaults to 0.1.
early_stopping (bool) – Whether early stopping is used or not. If early_stopping is True and validation_fraction=0.0, the training loss is used as an early stopping criterion. Defaults to False.
tol (float) – Tolerance for early stopping. When the validation/training loss is not improving by at least tol for n_iter_no_change consecutive iterations, training is stopped early.
n_iter_no_change (int) – Maximum number of epochs to not meet tol improvement.
device (str or None) – Device to use for training (e.g., 'cpu', 'cuda', 'cuda:0'). If None, CUDA is used when available, otherwise falls back to CPU.
verbosity (int) – Verbosity level for training logs.

fit(X: numpy.ndarray, y: numpy.ndarray) → typing_extensions.Self

Fit the MLP classifier to the provided training data.

Parameters:

X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
y (np.ndarray) – Target labels of shape (n_samples,).

Returns:

The fitted classifier instance.

Return type:

Self

Raises:

ValueError – If X and y have incompatible shapes.

predict(X: numpy.ndarray) → numpy.ndarray

Predict class labels for the given input samples.

Parameters:: X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
Returns:: Predicted class labels using the original label encoding.
Return type:: np.ndarray
Raises:: NotFittedError – If predict() is used before calling fit().

predict_proba(X: numpy.ndarray) → numpy.ndarray

Predict class probabilities for the given samples.

Parameters:: X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
Returns:: Array of shape (n_samples, n_classes) containing class probabilities.
Return type:: np.ndarray
Raises:: NotFittedError – If predict() is used before calling fit().

score(X: numpy.ndarray, y: numpy.ndarray, sample_weight: numpy.ndarray | None = None)

Compute the macro F1 score of the classifier on the given dataset.

Parameters:

X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
y (np.ndarray) – True labels.
sample_weight (np.ndarray or None) – Optional sample weights.

Returns:

Macro-averaged F1 score.

Return type:

float

Raises:

NotFittedError – If score() is used before calling fit().

save(filename: str = 'mlp_classifier.pkl', filepath: str | None = None) → None

Save the fitted classifier to disk using torch.save.

Parameters:

filename (str) – Name of the file to save the model to.
filepath (str or None) – Directory where the file will be saved. Defaults to current working directory.

Returns:

None

classmethod load(filename: str = 'mlp_classifier.pkl', filepath: str | None = None, map_location: str | torch.device = 'cpu') → typing_extensions.Self

Load a previously saved classifier from disk.

Parameters:

filename (str) – Name of the saved file.
filepath (str or None) – Directory containing the saved file. Defaults to current working directory.
map_location (str or torch.device) – Device mapping for loading the model (e.g., 'cpu' or 'cuda').

Returns:

The loaded classifier instance.

Return type:

Self

Neural Network Models

class flagx.gating.FCNNModel(*args: Any, **kwargs: Any)

Bases: Module

Fully connected neural network with arbitrary number of hidden linear layers of arbitrary size.

All but the output layer uses ReLU activations. Softmax is intentionally omitted from the final layer because torch.nn.CrossEntropyLoss expects raw logits.

The default parameters follow the configuration described in:

DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data (Cheng et al., 2022).

Their implementation can be found at:

https://github.com/lijcheng12/DGCyTOF/blob/main/Code_Study/DGCyTOF/CyTOF2/CyTOF2.ipynb (22/27/2025).

Parameters:

in_size (int) – Number of input features.
out_size (int) – Number of output classes.
layer_sizes (Tuple[int, ...], optional) – Sizes of the hidden layers. Defaults to (128, 64, 32).

layers

List of fully connected linear layers.

Type:: nn.ModuleList

forward(x: torch.Tensor) → torch.Tensor

Forward pass of the FCNN model. ReLU activation is applied after each layer except after the output layer.

Parameters:: x (torch.Tensor) – Input data tensor.
Returns:: Raw output logits with shape (batch_size, out_size).
Return type:: torch.Tensor