-
The Chinese University of Hong Kong
- Hong Kong SAR, China
Stars
LLMs for high-throughput mining and generation of antimicrobial peptides
Genome modeling and design across all domains of life
Deezer source separation library including pretrained models.
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome
An R package to calculate indices and theoretical physicochemical properties of peptides and protein sequences.
《动手学大模型Dive into LLMs》系列编程实践教程
[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
User friendly and accurate binder design pipeline
Source code of ProDMM. The paper is titled with "Unveiling Protein-DNA Interdependency: Harnessing Unified Multimodal Sequence Modeling, Understanding and Generation".
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
The second version of the Kraken taxonomic sequence classification system
tools for working with Bisulfite Sequencing data while preserving reads intrinsic dependencies
Code repository accompanying the manuscript, Symbiont loss and gain, rather than co-diversification shapes honeybee gut microbiota diversity and function
Visualize outputs of AmpliconArchitect and AmpliconReconstructor in Circos-style images.
A topic-centric list of HQ open datasets.
A reproducible Snakemake pipeline for end-to-end cell-free DNA (cfDNA) fragmentomics analysis (WPS, TSS, CTCF, motifs,MDS)).
A Snakemake pipeline for processing and analysis of cell-free RNA (cfRNA) sequencing data.
An community curated awesome list of tools, software, databases and other resources for working/analysing RNA Viruses