AlignX

A large-scale dataset of over 1.3 million personalized preference examples

Links

📜 Paper
🤗 AlignX
🤗 AlignX_test
🤗 AlignXpert_ICA (Training with a 7% Subset)
🤗 AlignXpert_PBA (Training with a 7% Subset)
🤗 AlignXpert_ICA (Training with the Full Dataset)
🤗 AlignXpert_PBA (Training with the Full Dataset)

News

[2025/03/24]: We have published a survey that presents the first comprehensive review of personalized alignment—a paradigm that enables LLMs to adapt their behavior within ethical boundaries based on individual preferences. For more details, see A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications.

Dataset Statistics

The table below summarizes the data sources and statistics for AlignX, involving both large-scale Reddit data and existing alignment datasets to maintain universal value alignment capabilities, with a total of 1,311,622 samples.

Source	Reddit	PKU-SafeRLHF	UltraFeedback	HelpSteer2
Dimension	The 90 self-defined preference dimensions	Safety	Helpfulness / Honesty / Instruction-Following / Truthfulness	Helpfulness / Correctness / Coherence / Complexity / Verbosity
#Examples	1,225,988	10,714	11,629 / 16,809 / 36,169 / 7,219	2,255 / 144 / 26 / 33 / 636

Dataset Overview

Dataset Format

{
    "prompt": "", // the post eliciting responses
    "chosen": "", // the user-preferred response
    "rejected": "", // the less preferred response relative to "chosen"
    "Preference Direction": [0/0.5/1] * 90, // a 90-element list: 1 = "Positive" (higher levels preferred), 0 = "Negative" (lower levels preferred), 0.5 = "Neutral" (no clear preference)
    "Demographic Information": "", // a comprehensive natural language description of the user
    "User-Generated Content": [ // comments written by the same user on other posts
        { // UGC 1
            "prompt": "",
            "comment": "",
            "Preference Direction": [0/0.5/1] * 90
        },
        { // UGC 2
            ...
        },
        { // UGC 3
            ...
        },
        { // UGC 4
            ...
        }
    ],
    "Pair-wise Comparative Feedback": [ // the preference pairs of the same user for comments under other posts
        { // PAIR 1
            "prompt": "",
            "chosen": "",
            "rejected": "",
            "Preference Direction": [0/0.5/1] * 90
        },
        { // PAIR 2
            ...
        },
        { // PAIR 3
            ...
        },
        { // PAIR 4
            ...
        }
    ]
}

AlignXpert

We implement In-Context Alignment (ICA) and Preference-Bridged Alignment (PBA) based on Llama-3.1-8B-Instruct. We train the model using the 7% subset (91,918 samples) and the full dataset (1,311,622 samples), respectively. The experimental results are shown in the table below, where our model significantly outperforms the baselines.

Training

The code is developed based on OpenRLHF.

Construct training data:

cd train
python format_data.py

In-context alignment (ICA)

cd train/OpenRLHF/examples/scripts
./ica_dpo.sh

Preference-bridged alignment (PBA)

cd train/OpenRLHF/examples/scripts
./pba_dpo.sh

Evaluation

Alignment Accuracy

./eval/loss_ica.py and ./eval/loss_pba.py are used to calculate the log probability of chosen and rejected responses with AlignXpert_ICA and AlignXpert_PBA as the policy models, respectively. ./eval/loss_few_shot.py calculates the log probability of chosen and rejected responses for the reference model. After obtaining the log probabilities for both the policy and reference models, ./eval/acc.py is used to compute the Alignment Accuracy.

GPT-4 Win Rate

Responses generated by ./eval/gen_ica.py, ./eval/gen_pba.py, and ./eval/gen_few_shot.py are evaluated using GPT-4.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
benchmark		benchmark
eval		eval
figures		figures
train		train
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AlignX

Links

News

Dataset Statistics

Dataset Overview

Dataset Format

AlignXpert

Training

In-context alignment (ICA)

Preference-bridged alignment (PBA)

Evaluation

Alignment Accuracy

GPT-4 Win Rate

About

Uh oh!

Releases

Packages

Languages

kiminh/AlignX

Folders and files

Latest commit

History

Repository files navigation

AlignX

Links

News

Dataset Statistics

Dataset Overview

Dataset Format

AlignXpert

Training

In-context alignment (ICA)

Preference-bridged alignment (PBA)

Evaluation

Alignment Accuracy

GPT-4 Win Rate

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages