This framework contains the implementation of the approaches presented in the paper Di Monda et al., Rapid Few-Shot Learning for Resilient Multi-Domain Intrusion Detection.
-
Place the dataset according to the path defined in
config.yamlunderbase_data_path.
Each dataset must then be placed in its own folder as defined indataset_config.py. -
It is recommended to use
virtualenvto create an isolated Python environment:virtualenv venv source venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
It is also recommended to install GNU parallel.
Navigate to the src directory and execute the main script:
cd src
python main.pyYou can then append any of the following options to your command.
Unless otherwise specified below, the default values are taken from config.yaml.
The parsing logic is defined in args_parser.py.
--seed [int]: Seed for reproducibility.--k-seed [int]: Seed used to sample the k shots in the few-shot case.--gpu: Use GPU if available.--n-thr [int]: Number of threads.--log-dir [str]: Log directory path.--n-tasks [1 or 2]:
1: The model is trained on both source and target datasets at the same time.
2: The model is first trained on the source dataset, then on the target dataset.--network [str]: Network to use. The value must match the.pyfilename insrc/network/that implements the network (e.g.,lopez17cnn).--ckpt-path [str]: Path to the.ptfile containing the state of an approach.--skip-t1: Skip the first task on the source dataset (used only when--n-tasks 2).--skip-t2: Skip the second task on the target dataset (used only when--n-tasks 2).--k [int]: Number of shots for the target dataset. If not specified, the entire training partition of the target dataset is used.
The config.yaml file contains additional parameters (e.g., for early stopping, checkpointing, etc.).
In addition to the standard dataset selection arguments, the following data-related parameters can be used for data loading and processing.
--src-dataset [str]: Source dataset to use.--trg-dataset [str]: Target dataset to use.--is-flat: Flatten the PSQ input (used for ML approaches).--num-pkts [int]: Number of packets to consider in each biflow.--fields [FIELD] ...: Fields used among['PL', 'IAT', 'DIR', 'WIN', 'FLG', 'TTL'].
You can specify multiple fields (e.g.,--fields PL IAT).--return-quintuple: Return the quintuple along with the data and labels. It is mostly used for explainability purposes.
The following options are defined in data_module.py:
--batch-size [int]: Batch size for training.--adapt-batch-size [int]: Batch size for adaptation.--num-workers [int]: Number of worker threads for data loading.--pin-memory: Enable pinned memory for faster data transfer to GPU.
The two main modules responsible for the training and validation logic of an approach are:
MLModuleinml_module.pyfor ML approaches.DLModuleindl_module.pyfor DL approaches.
To execute a specific approach located in src/approach/, set the --approach argument to the corresponding .py file name.
Each approach defines its own set of arguments, declared within its respective class. These approach-specific arguments are listed below:
-
Random Forest (RF) –
random_forest.py--rf-criterion [str]: Function to measure the quality of a split.--rf-n-estimators [int]: Number of trees in the forest.--rf-max-depth [int]: Maximum depth of the trees.
-
XGBoost (XGB) –
xgb.py--xgb-n-estimators [int]: Number of boosting rounds.--xgb-max-depth [int]: Maximum tree depth for base learners.--xgb-eval-metric [str]: Evaluation metric for validation data.
-
Baseline (Fine-tuning and Freezing) –
baseline.py--adaptation-strat [str]: Strategy for adapting the model (finetuningorfreezing).--adapt-lr [float]: Learning rate for adaptation.--adapt-epochs [int]: Number of epochs for adaptation.
-
Rethinking Few-Shot (RFS) –
md_rfs.py--alpha [float]: Weighting factor for distillation loss.--gamma [float]: Weighting factor for classification loss.--is-distill: Enables knowledge distillation.--kd-t [float]: Temperature for distillation loss.--teacher-path [str]: Path to the pretrained teacher model.--discr-path [str]: Path to the domain discriminator.
md-rfs/
├── config.yaml
├── requirements.txt
└── src/
├── main.py
├── run_experiments.sh
├── approach/
├── callback/
├── data/
├── module/
├── network/
├── trainer/
├── util/
This project is organized into multiple directories, each serving a specific purpose.
-
Approach: located in
src/approach/, this directory contains implementations of different approaches. Each approach defines its own training, validation, and inference logic, and can be selected via the--approachargument. -
Callback: located in
src/callback/, this directory includes callback functions that are executed at specific points during the code's execution. These callbacks handle tasks such as early stopping, model checkpointing, logging outputs, and more. -
Data: located in
src/data/, this directory is responsible for dataset management, including loading, preprocessing, and configuration. It provides utilities to read datasets, set up batch sizes, and define dataset-related parameters. -
Module: located in
src/module/, this directory contains core components related to DL-based approaches. It includes implementations for custom loss functions, teacher-student learning strategies, neural network head, and more. -
Network: located in
src/network/, this directory defines different neural network architectures used in the project. It provides a selection of predefined networks and a factory method for dynamically choosing a network based on configuration. -
Trainer: located in
src/trainer/, this directory contains the main training pipeline. It manages the optimization process, evaluation, and model adaptation flows. -
Util: located in
src/util/, this directory includes utility functions that support the overall framework. It handles configuration management, argument parsing, logging, directory creation, and setting random seeds for reproducibility.
Experiments can be executed in two ways:
You can manually run experiments by navigating to the src directory and executing:
python main.py --src-dataset <source> --trg-dataset <target> --approach <approach> --seed <seed> [other options]This allows full control over individual experiment parameters.
For running multiple experiments in a combinatorial manner, use the run_experiments.sh script:
You can manually run experiments by navigating to the src directory and executing:
./run_experiments.sh --src-dataset sd1,sd2 --trg-dataset td1,td2 \
--seed 0-10 --approach random_forest,xgb --cpu 4 --log-keyword testThis script automatically generates experiment combinations based on provided parameters and runs them in parallel if GNU parallel is available.
Otherwise, it falls back to xargs.
We thank the following open-source implementations that were used in this work:
If you use this framework, please cite:
@inproceedings{dimonda2025rapid,
title={Rapid Few-Shot Learning for Resilient Multi-Domain Intrusion Detection},
author={Di Monda, Davide and Rustam, Furquan and Anca Delia, Jurcut and Pescapè, Antonio},
booktitle={IEEE Global Communications Conference (GLOBECOM)},
pages={xxxx--xxxx},
year={2025},
organization={IEEE}
}