flagx.gating
This module provides classification methods for automated flow cytometry gating.
SOMClassifier
- class flagx.gating.SOMClassifier(*args: Any, **kwargs: Any)
Bases:
BaseEstimator,ClassifierMixinSelf-Organizing Map (SOM) classifier with scikit-learn–compatible API.
This classifier uses Somoclu to train a 2D SOM grid in an unsupervised fashion and assigns class labels to SOM units by majority vote across labeled training samples. Predictions are computed using the best-matching unit (BMU) for each sample and the majority class associated with that <unit.
- The classifier supports:
Unsupervised SOM training
Supervised unit annotation
Class probability estimation
Hyperparameter tuning (via GridSearchCV)
SOM quality metrics (quantization error, topographic error)
Visualization-oriented transformations
Model saving and loading
- som_
Trained SOM object.
- Type:
Somoclu
- is_fitted_
Whether the model has been fitted.
- Type:
bool
- classes_
Class labels after re-indexing to integers starting from 0.
- Type:
np.ndarray or None
- class_counts_
Class counts from the training data.
- Type:
np.ndarray or None
- og_classes_
Original class labels before re-indexing.
- Type:
np.ndarray or None
- class_priors_
Empirical class priors.
- Type:
np.ndarray or None
- som_unit_labels_
Majority class per SOM unit.
- Type:
np.ndarray
- class_counts_per_unit_
Class histogram per SOM unit.
- Type:
np.ndarray
- grid_search_
Grid search results if hyperparameter tuning was performed.
- Type:
GridSearchCV or None
- Parameters:
som_topology (Literal['planar', 'toroid']) – SOM grid topology. Defaults to
'planar'.som_grid_type (Literal['rectangular', 'hexagonal']) – Grid layout type. Defaults to
'rectangular'.som_dimensions (Tuple[int, int]) – Dimensions of the SOM grid (n_columns, n_rows). Defaults to
(10, 10).neighborhood (Literal['gaussian', 'bubble']) – Neighborhood function type. Defaults to
'gaussian'.gaussian_neighborhood_sigma (float or None) – Sigma for Gaussian neighborhood function. Defaults to 0.1.
initialization (Literal['random', 'pca']) – Codebook initialization method. Defaults to
'pca'.initial_codebook (np.ndarray or None) – Custom initialization of SOM weights. Defaults to None.
n_epochs (int) – Number of SOM training epochs. Defaults to 100.
radius_0 (float) – Initial neighborhood radius. Negative values are interpreted as fractions of the grid size. Defaults to -0.5.
radius_n (float) – Final neighborhood radius. Defaults to 0.1.
radius_cooling (Literal['linear', 'exponential']) – Radius decay schedule. Defaults to
'exponential'.learning_rate_0 (float) – Initial learning rate. Defaults to 0.1.
learning_rate_n (float) – Final learning rate. Defaults to 0.001.
learning_rate_decay (Literal['linear', 'exponential']) – Learning rate decay schedule. Defaults to
'exponential'.unlabeled_label (Any) – Label indicating unlabeled samples. Defaults to -999.
verbosity (int) – Logging level. Defaults to 1.
- fit(X: numpy.ndarray, y: numpy.ndarray) typing_extensions.Self
Train the SOM on input data and annotate units if labeled data is provided.
- Parameters:
X (np.ndarray) – Training features of shape (n_samples, n_features).
y (np.ndarray) – Training labels. Unlabeled samples must be marked using
unlabeled_label.
- Returns:
The fitted classifier instance.
- Return type:
Self
- Raises:
ValueError – If the feature dimension does not match a previous fit call.
UserWarning – If fitting continues from an already-initialized SOM.
- predict(X: numpy.ndarray) numpy.ndarray
Predict labels for new samples using the BMU and unit annotations.
- Parameters:
X (np.ndarray) – Input feature matrix.
- Returns:
Predicted labels in the original label space.
- Return type:
np.ndarray
- Raises:
NotFittedError – If the classifier has not been fitted.
UserWarning – If units without labels are BMU for some samples.
- predict_proba(X: numpy.ndarray) numpy.ndarray
Estimate class probabilities based on the class distribution of the BMU.
- Parameters:
X (np.ndarray) – Input feature matrix.
- Returns:
Class probabilities per sample.
- Return type:
np.ndarray
- Raises:
NotFittedError – If the classifier has not been fitted.
UserWarning – If no labeled data was provided.
- annotate_som(X: numpy.ndarray, y: numpy.ndarray) typing_extensions.Self
Assign class labels to SOM units by computing the majority class among samples for which the respective unit is the BMU.
- Parameters:
X (np.ndarray) – Input features for annotation.
y (np.ndarray) – Labels corresponding to X.
- Returns:
Updated classifier instance with unit annotations.
- Return type:
Self
- Raises:
RuntimeError – If SOM has not been trained prior to annotation.
UserWarning – If some SOM units have no support from labeled samples.
- hyperparameter_tuning(X: numpy.ndarray, y: numpy.ndarray, param_grid: Dict | None = None, cv: int | sklearn.model_selection.BaseCrossValidator | Iterable | None = 5, scoring: str | Callable | List | Tuple | Dict | None = 'internal', refit: bool | str | Callable = True, gridsearchcv_kwargs: Dict | None = None) typing_extensions.Self
Perform hyperparameter optimization using Scikit-learn’s GridSearchCV.
- Parameters:
X (np.ndarray) – Feature matrix.
y (np.ndarray) – Labels.
param_grid (dict or None) – Hyperparameter search space.
cv (int or CrossValidator) – Number of folds or cross-validation strategy. Defaults to 5.
scoring (str or callable or None) – Scoring metric. If
'internal', macro-F1 is used. Defaults to'internal'.refit (bool or str or callable) – Whether to refit using the best model. Defaults to
True.gridsearchcv_kwargs (dict or None) – Additional parameters for GridSearchCV. Defaults to
None.
- Returns:
Classifier with updated best-found parameters.
- Return type:
Self
Notes
The method updates the instance with GridSearchCV stored in the
grid_search_attribute.
- score(X: numpy.ndarray, y: numpy.ndarray, sample_weight: numpy.ndarray | None = None)
Compute macro F1 score on the provided data.
- Parameters:
X (np.ndarray) – Feature matrix.
y (np.ndarray) – True labels.
sample_weight (np.ndarray or None) – Optional sample weights.
- Returns:
Macro-averaged F1 score.
- Return type:
float
- activation_frequencies(X: numpy.ndarray)
Compute activation frequencies of each SOM unit on the given data.
- Parameters:
X (np.ndarray) – Input features.
- Returns:
Array of shape (som_dim0, som_dim1) with normalized activation counts per unit.
- Return type:
np.ndarray
- quantization_error(X: numpy.ndarray) float
Compute the SOM quantization error.
Quantization error = mean Euclidean distance between samples and the codebook vector of their BMU.
- Parameters:
X (np.ndarray) – Input features.
- Returns:
Mean quantization error.
- Return type:
float
- topographic_error(X: numpy.ndarray) float
Compute the SOM topographic error.
Topographic error = proportion of samples where the 1st and 2nd BMUs are not adjacent on the SOM grid.
- Parameters:
X (np.ndarray) – Input feature matrix.
- Returns:
Topographic error.
- Return type:
float
- Raises:
NotImplementedError – If SOM topology is not planar rectangular.
- unit_impurity(impurity_measure: typing_extensions.Literal.('entropy', 'gini')='entropy') numpy.ndarray
Compute class impurity for each SOM unit.
- Parameters:
impurity_measure (Literal['entropy', 'gini']) – Impurity metric. Defaults to
'entropy'.- Returns:
Impurity per SOM unit.
- Return type:
np.ndarray
- Raises:
UserWarning – If classifier was trained without labeled data.
- mean_impurity(impurity_measure: typing_extensions.Literal.('entropy', 'gini')='entropy') float
Compute the mean impurity across all SOM units.
- Parameters:
impurity_measure (Literal['entropy', 'gini']) – Impurity metric. Defaults to
'entropy'.- Returns:
Mean impurity over all units.
- Return type:
float
- unpredictable_classes() numpy.ndarray
Identify classes that were seen during training but cannot be predicted because no SOM unit was annotated with those labels.
- Returns:
Array of missing/unpredictable classes.
- Return type:
np.ndarray
- Raises:
UserWarning – If no labeled data was provided.
- save(filename: str = 'som_classifier.pkl', filepath: str | None = None) None
Save the trained classifier to disk using pickle.
- Parameters:
filename (str) – Output filename.
filepath (str or None) – Directory to save the file. Defaults to CWD.
- Returns:
None
- classmethod load(filename: str = 'som_classifier.pkl', filepath: str | None = None) typing_extensions.Self
Load a saved classifier instance from disk.
- Parameters:
filename (str) – File to load.
filepath (str or None) – Directory containing the file.
- Returns:
Loaded classifier instance.
- Return type:
Self
- reset()
Reset the classifier to its untrained state, clearing the trained SOM, class annotations, and metadata.
- Returns:
None
- transform(X: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray]
Project samples onto the SOM grid and generate visualization-friendly scattered BMU coordinates.
- Parameters:
X (np.ndarray) – Input data.
- Returns:
bmus (np.ndarray): BMU coordinates for each sample. bmus_scattered (np.ndarray): Scattered BMU coordinates for visualization. som_unit_ids (np.ndarray): Unit ID in row-major format for each sample. radii (np.ndarray): Radius proportional to activation frequency across input data of BMU for each sample.
- Return type:
Tuple
- Raises:
NotFittedError – If the classifier has not been trained.
MLPClassifier
- class flagx.gating.MLPClassifier(*args: Any, **kwargs: Any)
Bases:
BaseEstimator,ClassifierMixinA three layer perceptron (MLP) classifier.
This classifier wraps a fully connected neural network implemented in PyTorch while exposing a scikit-learn–style API. The model supports multi-class classification, automatic device selection (CPU or GPU), and provides the methods
fit(),predict(),predict_proba(),score(),save(), andload(). For model training CrossEntropyLoss and the Adam optimizer are used.- classes_
Class labels after re-indexing to integers starting from 0.
- Type:
np.ndarray or None
- class_counts_
Class counts from the training data.
- Type:
np.ndarray or None
- og_classes_
Original class labels before re-indexing.
- Type:
np.ndarray or None
- class_priors_
Empirical class priors.
- Type:
np.ndarray or None
- new_to_og_classes_dict_
Mapping from new integer labels back to original labels.
- Type:
dict[int, Any] or None
- data_set_
PyTorch tensor dataset constructed during fitting.
- Type:
TensorDataset or None
- data_loader_
PyTorch DataLoader used for minibatch training.
- Type:
DataLoader or None
- model_
Neural network model.
- Type:
nn.Module or None
- criterion_
Loss function, PyTorch CrossEntropyLoss.
- Type:
nn.Module or None
- optimizer_
PyTorch Adam optimizer with learning rate 0.001.
- Type:
Optimizer or None
- training_log_
Logged losses of the training run.
- Type:
dict[str, int | list[int | float]] or None
- is_fitted_
Whether the classifier has been fitted.
- Type:
bool
- Parameters:
layer_sizes (Tuple[int, ...]) – Sizes of the hidden layers in the fully connected neural network.
n_epochs (int) – Number of training epochs.
loss_params (dict[str, Any] or None) – Parameters passed to the PyTorch’s CrossEntropyLoss.
optimizer_params (dict[str, Any] or None) – Parameters passed to the PyTorch’s Adam optimizer. If None, defaults to
{'lr': 0.001}.data_loader_params (dict[str, Any] or None) – Parameters passed to the PyTorch DataLoader. If None, defaults to
{'batch_size': 128, 'shuffle': True, 'num_workers': 1}.validation_fraction (float) – Fraction of the training data used as validation set. Defaults to 0.1.
early_stopping (bool) – Whether early stopping is used or not. If early_stopping is True and validation_fraction=0.0, the training loss is used as an early stopping criterion. Defaults to False.
tol (float) – Tolerance for early stopping. When the validation/training loss is not improving by at least tol for n_iter_no_change consecutive iterations, training is stopped early.
n_iter_no_change (int) – Maximum number of epochs to not meet tol improvement.
device (str or None) – Device to use for training (e.g.,
'cpu','cuda','cuda:0'). If None, CUDA is used when available, otherwise falls back to CPU.verbosity (int) – Verbosity level for training logs.
- fit(X: numpy.ndarray, y: numpy.ndarray) typing_extensions.Self
Fit the MLP classifier to the provided training data.
- Parameters:
X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
y (np.ndarray) – Target labels of shape (n_samples,).
- Returns:
The fitted classifier instance.
- Return type:
Self
- Raises:
ValueError – If X and y have incompatible shapes.
- predict(X: numpy.ndarray) numpy.ndarray
Predict class labels for the given input samples.
- Parameters:
X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
- Returns:
Predicted class labels using the original label encoding.
- Return type:
np.ndarray
- Raises:
NotFittedError – If
predict()is used before callingfit().
- predict_proba(X: numpy.ndarray) numpy.ndarray
Predict class probabilities for the given samples.
- Parameters:
X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
- Returns:
Array of shape (n_samples, n_classes) containing class probabilities.
- Return type:
np.ndarray
- Raises:
NotFittedError – If
predict()is used before callingfit().
- score(X: numpy.ndarray, y: numpy.ndarray, sample_weight: numpy.ndarray | None = None)
Compute the macro F1 score of the classifier on the given dataset.
- Parameters:
X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
y (np.ndarray) – True labels.
sample_weight (np.ndarray or None) – Optional sample weights.
- Returns:
Macro-averaged F1 score.
- Return type:
float
- Raises:
NotFittedError – If
score()is used before callingfit().
- save(filename: str = 'mlp_classifier.pkl', filepath: str | None = None) None
Save the fitted classifier to disk using
torch.save.- Parameters:
filename (str) – Name of the file to save the model to.
filepath (str or None) – Directory where the file will be saved. Defaults to current working directory.
- Returns:
None
- classmethod load(filename: str = 'mlp_classifier.pkl', filepath: str | None = None, map_location: str | torch.device = 'cpu') typing_extensions.Self
Load a previously saved classifier from disk.
- Parameters:
filename (str) – Name of the saved file.
filepath (str or None) – Directory containing the saved file. Defaults to current working directory.
map_location (str or torch.device) – Device mapping for loading the model (e.g.,
'cpu'or'cuda').
- Returns:
The loaded classifier instance.
- Return type:
Self
Neural Network Models
- class flagx.gating.FCNNModel(*args: Any, **kwargs: Any)
Bases:
ModuleFully connected neural network with arbitrary number of hidden linear layers of arbitrary size.
All but the output layer uses ReLU activations. Softmax is intentionally omitted from the final layer because
torch.nn.CrossEntropyLossexpects raw logits.The default parameters follow the configuration described in:
DGCyTOF: Deep learning with graphic cluster visualization to predict cell types of single cell mass cytometry data (Cheng et al., 2022).
Their implementation can be found at:
https://github.com/lijcheng12/DGCyTOF/blob/main/Code_Study/DGCyTOF/CyTOF2/CyTOF2.ipynb (22/27/2025).
- Parameters:
in_size (int) – Number of input features.
out_size (int) – Number of output classes.
layer_sizes (Tuple[int, ...], optional) – Sizes of the hidden layers. Defaults to (128, 64, 32).
- layers
List of fully connected linear layers.
- Type:
nn.ModuleList
- forward(x: torch.Tensor) torch.Tensor
Forward pass of the FCNN model. ReLU activation is applied after each layer except after the output layer.
- Parameters:
x (torch.Tensor) – Input data tensor.
- Returns:
Raw output logits with shape (batch_size, out_size).
- Return type:
torch.Tensor