Skip to content
/ AlignX Public
forked from JinaLeejnl/AlignX

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment

Notifications You must be signed in to change notification settings

kiminh/AlignX

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlignX

A large-scale dataset of over 1.3 million personalized preference examples

Links

News

Dataset Statistics

The table below summarizes the data sources and statistics for AlignX, involving both large-scale Reddit data and existing alignment datasets to maintain universal value alignment capabilities, with a total of 1,311,622 samples.

Source Reddit PKU-SafeRLHF UltraFeedback HelpSteer2
Dimension The 90 self-defined preference dimensions Safety Helpfulness / Honesty / Instruction-Following / Truthfulness Helpfulness / Correctness / Coherence / Complexity / Verbosity
#Examples 1,225,988 10,714 11,629 / 16,809 / 36,169 / 7,219 2,255 / 144 / 26 / 33 / 636

Dataset Overview

Dataset Format

{
    "prompt": "", // the post eliciting responses
    "chosen": "", // the user-preferred response
    "rejected": "", // the less preferred response relative to "chosen"
    "Preference Direction": [0/0.5/1] * 90, // a 90-element list: 1 = "Positive" (higher levels preferred), 0 = "Negative" (lower levels preferred), 0.5 = "Neutral" (no clear preference)
    "Demographic Information": "", // a comprehensive natural language description of the user
    "User-Generated Content": [ // comments written by the same user on other posts
        { // UGC 1
            "prompt": "",
            "comment": "",
            "Preference Direction": [0/0.5/1] * 90
        },
        { // UGC 2
            ...
        },
        { // UGC 3
            ...
        },
        { // UGC 4
            ...
        }
    ],
    "Pair-wise Comparative Feedback": [ // the preference pairs of the same user for comments under other posts
        { // PAIR 1
            "prompt": "",
            "chosen": "",
            "rejected": "",
            "Preference Direction": [0/0.5/1] * 90
        },
        { // PAIR 2
            ...
        },
        { // PAIR 3
            ...
        },
        { // PAIR 4
            ...
        }
    ]
}

AlignXpert

We implement In-Context Alignment (ICA) and Preference-Bridged Alignment (PBA) based on Llama-3.1-8B-Instruct. We train the model using the 7% subset (91,918 samples) and the full dataset (1,311,622 samples), respectively. The experimental results are shown in the table below, where our model significantly outperforms the baselines.

Training

The code is developed based on OpenRLHF.

Construct training data:

cd train
python format_data.py

In-context alignment (ICA)

cd train/OpenRLHF/examples/scripts
./ica_dpo.sh

Preference-bridged alignment (PBA)

cd train/OpenRLHF/examples/scripts
./pba_dpo.sh

Evaluation

Alignment Accuracy

./eval/loss_ica.py and ./eval/loss_pba.py are used to calculate the log probability of chosen and rejected responses with AlignXpertICA and AlignXpertPBA as the policy models, respectively. ./eval/loss_few_shot.py calculates the log probability of chosen and rejected responses for the reference model. After obtaining the log probabilities for both the policy and reference models, ./eval/acc.py is used to compute the Alignment Accuracy.

GPT-4 Win Rate

Responses generated by ./eval/gen_ica.py, ./eval/gen_pba.py, and ./eval/gen_few_shot.py are evaluated using GPT-4.

About

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Other 0.6%