CPU OOM when pretraining hubert

Hi, I encountered a CPU memory OOM issue while training Hubert. The problem is that ```load_label_offset```and ```load_audio```in ```fairseq/data/audio/hubert_dataset.py```load all the data into a list at once. Are there any good solutions for this? I have roughly 200 million data entries.