flagx.pipeline
The FLAG-X GatingPipeline offers higher level access to the functionality implemented in the io, gating, and dimred modules.
It orchestrates:
training data loading and processing, model training, and pipeline saving;
as well as pipeline loading, automated gating, dimensionality, and export of results to FCS for inference on new data.
- class flagx.pipeline.GatingPipeline(train_data_file_path: str | None = None, train_data_file_names: List[str] | None = None, train_data_file_type: typing_extensions.Literal.('fcs', 'csv', 'lmd') | None=None, save_path: str | None = None, channels: List[int] | List[str] | None = None, label_key: int | str | None = None, compensate: bool = False, channel_names_alignment_kwargs: Dict[str, ~typing.Any] | None=None, relabel_data_kwargs: Dict[str, ~typing.Any] | None=None, preprocessing_kwargs: Dict[str, ~typing.Any] | None=None, downsampling_kwargs: Dict[str, ~typing.Any] | None=None, gating_method: typing_extensions.Literal.('som', 'mlp')='som', gating_method_kwargs: Dict[str, ~typing.Any] | None=None, prediction_threshold: float | None = None, verbosity: int = 1)
Bases:
objectEnd-to-end flow cytometry gating pipeline supporting preprocessing, downsampling, dimensionality reduction, and supervised or unsupervised gating.
This class orchestrates the full workflow:
Load raw FCS/CSV/LMD files
Align channel names
(Optional) Relabel training data
(Optional) Preprocess data sample-wise
(Optional) Downsample data
Train the gating module
supervised: MLP classifier
supervised or unsupervised: SOM classifier
Inference on new samples
Optional dimensionality reduction (UMAP, SOM, PCA, t-SNE, etc.)
Export annotated FCS files
- train_data_file_path
Path to directory containing training data. Defaults to CWD.
- Type:
str or None
- train_data_file_names
Specific training filenames to load. If
None, uses all files in directory.- Type:
list[str] or None
- train_data_file_type
Input file type. If
None, inferred from first filename.- Type:
Literal[‘fcs’,’csv’, ‘lmd’] or None
- save_path
Output directory for pipeline metadata and results. If
None, defaults to CWD.- Type:
str or None
- channels
Indices or names of channels to train on.
- Type:
list[int] or list[str] or None
- label_key
Key to labels. Can be column index in
.X, channel name (key invar_names), or key to.obs. If None, only unsupervised SOM training is available.- Type:
int, str, or None
- compensate
Whether to apply compensation or not. Defaults to False. See
FlowDataManager.sample_wise_compensation().- Type:
bool
- channel_names_alignment_kwargs
Arguments forwarded to channel alignment. See
FlowDataManager.align_channel_names().- Type:
dict or None
- relabel_data_kwargs
Mapping for relabeling training data. See
FlowDataManager.relabel_data().- Type:
dict or None
- preprocessing_kwargs
Sample-wise preprocessing configuration. See
FlowDataManager.sample_wise_preprocessing().- Type:
dict or None
- downsampling_kwargs
Sample-wise downsampling configuration. See
FlowDataManager.sample_wise_downsampling().- Type:
dict or None
- gating_method
Which model to train.
- Type:
Literal[‘som’,’mlp’]
- gating_method_kwargs
Additional arguments for SOM or MLP.
- Type:
dict or None
- prediction_threshold
Binary case: Prediction threshold. Defaults to 0.5.
Multiclass case: If prediction certainty below threshold classifier abstains from making prediction. Event is marked with -1. Defaults to 0.0, i.e. no abstention.
- Type:
float or None
- verbosity
Logging level.
- Type:
int
- is_trained_
Whether the pipeline has been successfully trained.
- Type:
bool
- gating_module_
The fitted gating model.
- Type:
SOMClassifier or MLPClassifier or None
- binary_classes_
Whether the task is binary classification.
- Type:
bool or None
- Parameters:
train_data_file_path (str or None) – Path to directory containing training data. Defaults to CWD.
train_data_file_names (list[str] or None) – Specific training filenames to load. If None, uses all files in directory.
train_data_file_type (Literal['fcs','csv', 'lmd'] or None) – Input file type. If None, inferred from first filename.
save_path (str or None) – Output directory for pipeline metadata and results. If None, defaults to CWD.
channels (list[int] or list[str] or None) – Indices or names of channels to train on.
label_key (int, str, or None) – Key to labels in .X, .obs, or .layers. If None, only unsupervised SOM training is available.
compensate (bool) – Whether to apply compensation or not. Defaults to False.
channel_names_alignment_kwargs (dict or None) – Arguments forwarded to channel alignment.
relabel_data_kwargs (dict or None) – Mapping for relabeling training data.
preprocessing_kwargs (dict or None) – Sample-wise preprocessing configuration.
downsampling_kwargs (dict or None) – Sample-wise downsampling configuration.
gating_method (Literal['som','mlp']) – Which model to train.
gating_method_kwargs (dict or None) – Additional arguments for SOM or MLP.
prediction_threshold (float or None) – Binary/abstention threshold; defaults chosen automatically.
verbosity (int) – Logging level.
- Returns:
None
- train()
Train the full gating pipeline.
This executes the full training workflow:
Load raw data
(Optional) Align channel names
(Optional) Relabel and preprocess
(Optional) Downsample
Construct training matrix from all samples
Train SOM or MLP gating module
The gating module is stored in
self.gating_module_.- Raises:
ValueError – If MLP is selected but
label_keyis None.ValueError – If binary labels are not exactly
{0, 1}.
- inference(data_file_path: str | None = None, data_file_names: List[str] | None = None, sample_wise: bool = False, gate: bool = True, dim_red_methods: ('som', 'pca', 'umap', 'tsne', 'isomap', 'locallylinearembedding', 'mds', 'spectralembedding'), ...] | None=('umap', ), dim_red_method_kwargs: Dict[str, ~typing.Any] | None, ...] | None=None, save_path: str | None = None, save_filename: str | None = None, scale_channels: List[str] | None = None, val_range: Tuple[float, float]=(0.0, 1048576), keep_unscaled: bool = False)
Apply the trained pipeline to new data for gating and/or dimensionality reduction.
This performs: - Data loading + preprocessing (same as during training) - (Optional) Prediction using the trained model - (Optional) Dimensionality reduction using one or more methods - Export to FCS file(s) with annotations added in new channels
- Parameters:
data_file_path (str or None) – Directory containing inference data.
data_file_names (list[str] or None) – Specific inference filenames.
sample_wise (bool) – If True, run dimension reduction and export separately per sample.
gate (bool) – Whether to apply the trained gating model.
dim_red_methods (tuple[str] or None) – Dimensionality reduction methods to apply.
dim_red_method_kwargs (tuple[dict] or None) – One kwargs dict per method.
save_path (str or None) – Output directory for FCS export.
save_filename (str or None) – Base filename for exported FCS.
scale_channels (list[str] or None) – Additional channels to scale for FCS export (e.g., previously added integer labels).
val_range (tuple[float,float]) – Value range for scaling when writing FCS. (This is done for proper display of the added annotations in standard analysis software.)
keep_unscaled (bool) – Whether to also retain unscaled values in separate channels.
- Returns:
None
- Raises:
NotFittedError – If gating was requested but the model is not trained.
ValueError – If dimensionality reduction kwargs do not match number of methods.
- save(filename: str = 'gating_pipeline.pkl', filepath: str | None = None)
Save the pipeline to a pickle file, including the gating model.
- Parameters:
filename (str) – Output filename.
filepath (str or None) – Directory to save to. Defaults to pipeline save_path.
- Returns:
None
- classmethod load(filename: str = 'gating_pipeline.pkl', filepath: str | None = None)
Load a previously saved GatingPipeline.
- Parameters:
filename (str) – Pipeline pickle filename.
filepath (str or None) – Directory path for the file. Defaults to CWD.
- Returns:
Fully restored pipeline instance.
- Return type: