MD-RFS

This framework contains the implementation of the approaches presented in the paper Di Monda et al., Rapid Few-Shot Learning for Resilient Multi-Domain Intrusion Detection.

Installation

Place the dataset according to the path defined in config.yaml under base_data_path.
Each dataset must then be placed in its own folder as defined in dataset_config.py.
It is recommended to use virtualenv to create an isolated Python environment:
```
virtualenv venv
source venv/bin/activate
```
Install the required packages:
```
pip install -r requirements.txt
```
It is also recommended to install GNU parallel.

How To Use It

Navigate to the src directory and execute the main script:

cd src
python main.py

You can then append any of the following options to your command.
Unless otherwise specified below, the default values are taken from config.yaml. The parsing logic is defined in args_parser.py.

General Arguments

--seed [int]: Seed for reproducibility.
--k-seed [int]: Seed used to sample the k shots in the few-shot case.
--gpu: Use GPU if available.
--n-thr [int]: Number of threads.
--log-dir [str]: Log directory path.
--n-tasks [1 or 2]:
1: The model is trained on both source and target datasets at the same time.
2: The model is first trained on the source dataset, then on the target dataset.
--network [str]: Network to use. The value must match the .py filename in src/network/ that implements the network (e.g., lopez17cnn).
--ckpt-path [str]: Path to the .pt file containing the state of an approach.
--skip-t1: Skip the first task on the source dataset (used only when --n-tasks 2).
--skip-t2: Skip the second task on the target dataset (used only when --n-tasks 2).
--k [int]: Number of shots for the target dataset. If not specified, the entire training partition of the target dataset is used.

The config.yaml file contains additional parameters (e.g., for early stopping, checkpointing, etc.).

Data-Related Arguments

In addition to the standard dataset selection arguments, the following data-related parameters can be used for data loading and processing.

--src-dataset [str]: Source dataset to use.
--trg-dataset [str]: Target dataset to use.
--is-flat: Flatten the PSQ input (used for ML approaches).
--num-pkts [int]: Number of packets to consider in each biflow.
--fields [FIELD] ...: Fields used among ['PL', 'IAT', 'DIR', 'WIN', 'FLG', 'TTL'].
You can specify multiple fields (e.g., --fields PL IAT).
--return-quintuple: Return the quintuple along with the data and labels. It is mostly used for explainability purposes.

The following options are defined in data_module.py:

--batch-size [int]: Batch size for training.
--adapt-batch-size [int]: Batch size for adaptation.
--num-workers [int]: Number of worker threads for data loading.
--pin-memory: Enable pinned memory for faster data transfer to GPU.

Approach Arguments

The two main modules responsible for the training and validation logic of an approach are:

MLModule in ml_module.py for ML approaches.
DLModule in dl_module.py for DL approaches.

To execute a specific approach located in src/approach/, set the --approach argument to the corresponding .py file name.

Each approach defines its own set of arguments, declared within its respective class. These approach-specific arguments are listed below:

ML Approaches

Random Forest (RF) – random_forest.py
- --rf-criterion [str]: Function to measure the quality of a split.
- --rf-n-estimators [int]: Number of trees in the forest.
- --rf-max-depth [int]: Maximum depth of the trees.
XGBoost (XGB) – xgb.py
- --xgb-n-estimators [int]: Number of boosting rounds.
- --xgb-max-depth [int]: Maximum tree depth for base learners.
- --xgb-eval-metric [str]: Evaluation metric for validation data.

DL Approaches

Baseline (Fine-tuning and Freezing) – baseline.py
- --adaptation-strat [str]: Strategy for adapting the model (finetuning or freezing).
- --adapt-lr [float]: Learning rate for adaptation.
- --adapt-epochs [int]: Number of epochs for adaptation.
Rethinking Few-Shot (RFS) – md_rfs.py
- --alpha [float]: Weighting factor for distillation loss.
- --gamma [float]: Weighting factor for classification loss.
- --is-distill: Enables knowledge distillation.
- --kd-t [float]: Temperature for distillation loss.
- --teacher-path [str]: Path to the pretrained teacher model.
- --discr-path [str]: Path to the domain discriminator.

Project Structure

md-rfs/
├── config.yaml
├── requirements.txt
└── src/
    ├── main.py
    ├── run_experiments.sh
    ├── approach/
    ├── callback/
    ├── data/
    ├── module/
    ├── network/
    ├── trainer/
    ├── util/

This project is organized into multiple directories, each serving a specific purpose.

Approach: located in src/approach/, this directory contains implementations of different approaches. Each approach defines its own training, validation, and inference logic, and can be selected via the --approach argument.
Callback: located in src/callback/, this directory includes callback functions that are executed at specific points during the code's execution. These callbacks handle tasks such as early stopping, model checkpointing, logging outputs, and more.
Data: located in src/data/, this directory is responsible for dataset management, including loading, preprocessing, and configuration. It provides utilities to read datasets, set up batch sizes, and define dataset-related parameters.
Module: located in src/module/, this directory contains core components related to DL-based approaches. It includes implementations for custom loss functions, teacher-student learning strategies, neural network head, and more.
Network: located in src/network/, this directory defines different neural network architectures used in the project. It provides a selection of predefined networks and a factory method for dynamically choosing a network based on configuration.
Trainer: located in src/trainer/, this directory contains the main training pipeline. It manages the optimization process, evaluation, and model adaptation flows.
Util: located in src/util/, this directory includes utility functions that support the overall framework. It handles configuration management, argument parsing, logging, directory creation, and setting random seeds for reproducibility.

Execution of Experiments

Experiments can be executed in two ways:

1. Direct Execution via `main.py`

You can manually run experiments by navigating to the src directory and executing:

python main.py --src-dataset <source> --trg-dataset <target> --approach <approach> --seed <seed> [other options]

This allows full control over individual experiment parameters.

2. Batch Execution via `run_experiments.sh`

For running multiple experiments in a combinatorial manner, use the run_experiments.sh script: You can manually run experiments by navigating to the src directory and executing:

./run_experiments.sh --src-dataset sd1,sd2 --trg-dataset td1,td2 \
    --seed 0-10 --approach random_forest,xgb --cpu 4 --log-keyword test

This script automatically generates experiment combinations based on provided parameters and runs them in parallel if GNU parallel is available. Otherwise, it falls back to xargs.

Acknowledgement

We thank the following open-source implementations that were used in this work:

Citation

If you use this framework, please cite:

@inproceedings{dimonda2025rapid,
  title={Rapid Few-Shot Learning for Resilient Multi-Domain Intrusion Detection},
  author={Di Monda, Davide and Rustam, Furquan and Anca Delia, Jurcut and Pescapè, Antonio},
  booktitle={IEEE Global Communications Conference (GLOBECOM)},
  pages={xxxx--xxxx},
  year={2025},
  organization={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
reproducibility		reproducibility
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MD-RFS

Installation

How To Use It

General Arguments

Data-Related Arguments

Approach Arguments

ML Approaches

DL Approaches

Project Structure

Execution of Experiments

1. Direct Execution via `main.py`

2. Batch Execution via `run_experiments.sh`

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

aascode/md-rfs

Folders and files

Latest commit

History

Repository files navigation

MD-RFS

Installation

How To Use It

General Arguments

Data-Related Arguments

Approach Arguments

ML Approaches

DL Approaches

Project Structure

Execution of Experiments

1. Direct Execution via main.py

2. Batch Execution via run_experiments.sh

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Direct Execution via `main.py`

2. Batch Execution via `run_experiments.sh`

Packages