๐ Try TRELLIS on Replicate - Generate 3D models from images in your browser!
Note: This Replicate deployment is maintained by firtoz, a fan of the TRELLIS project, and is not officially affiliated with Microsoft or the TRELLIS team. All rights, licenses, and intellectual property belong to Microsoft. For the original project, please visit microsoft/TRELLIS.
TRELLIS is a powerful 3D asset generation model that converts text or image prompts into high-quality 3D assets. This Replicate deployment focuses on the image-to-3D generation capabilities of TRELLIS.
03/25/2025
- Release training code.
- Release TRELLIS-text models and asset variants generation.
- Examples are provided as example_text.py and example_variant.py.
- Gradio demo is provided as app_text.py.
- Note: It is always recommended to do text to 3D generation by first generating images using text-to-image models and then using TRELLIS-image models for 3D generation. Text-conditioned models are less creative and detailed due to data limitations.
12/26/2024
- Release TRELLIS-500K dataset and toolkits for data preparation.
12/18/2024
- Implementation of multi-image conditioning for TRELLIS-image model. (#7). This is based on tuning-free algorithm without training a specialized model, so it may not give the best results for all input images.
- Add Gaussian export in
app.py
andexample.py
. (#40)
TRELLIS uses a unified Structured LATent (SLAT) representation that enables generation of different 3D output formats. The model deployed here is TRELLIS-image-large, which contains 1.2B parameters and is trained on a diverse dataset of 500K 3D objects.
Key features:
- Generate high-quality 3D assets from input images
- Multiple output formats: 3D Gaussians, Radiance Fields, and textured meshes
- Detailed shape and texture generation
- Support for various viewpoint renderings
For more examples and to try it directly in your browser, visit the Replicate model page.
The model accepts:
- An input image (PNG or JPEG format)
- Optional parameters for controlling the generation process
The model outputs:
- A GLB file containing the generated 3D model with textures
- Preview renders from multiple angles
- Optional: Raw 3D Gaussians or Radiance Field representations
import replicate
output = replicate.run(
"firtoz/trellis:version",
input={
"seed": 0,
"image": "https://replicate.delivery/pbxt/M6rvlcKpjcTijzvLfJw8SCWQ74M1jrxowbVDT6nNTxREcvxO/ephemeros_cartoonish_character_art_cyberpunk_crocodile_white_ba_486fb649-bc68-46a0-b429-751b43734b89.png",
"texture_size": 1024,
"mesh_simplify": 0.95,
"generate_color": True,
"generate_model": True,
"randomize_seed": True,
"generate_normal": True,
"ss_sampling_steps": 12,
"slat_sampling_steps": 12,
"ss_guidance_strength": 7.5,
"slat_guidance_strength": 3
}
)
print(output)
======= 2. Install the dependencies:
**Before running the following command there are somethings to note:**
- By adding `--new-env`, a new conda environment named `trellis` will be created. If you want to use an existing conda environment, please remove this flag.
- By default the `trellis` environment will use pytorch 2.4.0 with CUDA 11.8. If you want to use a different version of CUDA (e.g., if you have CUDA Toolkit 12.2 installed and do not want to install another 11.8 version for submodule compilation), you can remove the `--new-env` flag and manually install the required dependencies. Refer to [PyTorch](https://pytorch.org/get-started/previous-versions/) for the installation command.
- If you have multiple CUDA Toolkit versions installed, `PATH` should be set to the correct version before running the command. For example, if you have CUDA Toolkit 11.8 and 12.2 installed, you should run `export PATH=/usr/local/cuda-11.8/bin:$PATH` before running the command.
- By default, the code uses the `flash-attn` backend for attention. For GPUs do not support `flash-attn` (e.g., NVIDIA V100), you can remove the `--flash-attn` flag to install `xformers` only and set the `ATTN_BACKEND` environment variable to `xformers` before running the code. See the [Minimal Example](#minimal-example) for more details.
- The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time.
- If you encounter any issues during the installation, feel free to open an issue or contact us.
Create a new conda environment named `trellis` and install the dependencies:
```sh
. ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
```
The detailed usage of `setup.sh` can be found by running `. ./setup.sh --help`.
```sh
Usage: setup.sh [OPTIONS]
Options:
-h, --help Display this help message
--new-env Create a new conda environment
--basic Install basic dependencies
--train Install training dependencies
--xformers Install xformers
--flash-attn Install flash-attn
--diffoctreerast Install diffoctreerast
--vox2seq Install vox2seq
--spconv Install spconv
--mipgaussian Install mip-splatting
--kaolin Install kaolin
--nvdiffrast Install nvdiffrast
--demo Install all dependencies for demo
```
Here is an example of how to use the pretrained models for 3D asset generation.
microsoft-main-update
import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils
# Load a pipeline from a model folder or a Hugging Face model hub.
pipeline = TrellisImageTo3DPipeline.from_pretrained("microsoft/TRELLIS-image-large")
pipeline.cuda()
# Load an image
image = Image.open("assets/example_image/T.png")
# Run the pipeline
outputs = pipeline.run(
image,
seed=1,
# Optional parameters
# sparse_structure_sampler_params={
# "steps": 12,
# "cfg_strength": 7.5,
# },
# slat_sampler_params={
# "steps": 12,
# "cfg_strength": 3,
# },
)
# outputs is a dictionary containing generated 3D assets in different formats:
# - outputs['gaussian']: a list of 3D Gaussians
# - outputs['radiance_field']: a list of radiance fields
# - outputs['mesh']: a list of meshes
# Render the outputs
video = render_utils.render_video(outputs['gaussian'][0])['color']
imageio.mimsave("sample_gs.mp4", video, fps=30)
video = render_utils.render_video(outputs['radiance_field'][0])['color']
imageio.mimsave("sample_rf.mp4", video, fps=30)
video = render_utils.render_video(outputs['mesh'][0])['normal']
imageio.mimsave("sample_mesh.mp4", video, fps=30)
# GLB files can be extracted from the outputs
glb = postprocessing_utils.to_glb(
outputs['gaussian'][0],
outputs['mesh'][0],
# Optional parameters
simplify=0.95, # Ratio of triangles to remove in the simplification process
texture_size=1024, # Size of the texture used for the GLB
)
glb.export("sample.glb")
# Save Gaussians as PLY files
outputs['gaussian'][0].save_ply("sample.ply")
After running the code, you will get the following files:
sample_gs.mp4
: a video showing the 3D Gaussian representationsample_rf.mp4
: a video showing the Radiance Field representationsample_mesh.mp4
: a video showing the mesh representationsample.glb
: a GLB file containing the extracted textured meshsample.ply
: a PLY file containing the 3D Gaussian representation
app.py provides a simple web demo for 3D asset generation. Since this demo is based on Gradio, additional dependencies are required:
. ./setup.sh --demo
After installing the dependencies, you can run the demo with the following command:
python app.py
Then, you can access the demo at the address shown in the terminal.
We provide TRELLIS-500K, a large-scale dataset containing 500K 3D assets curated from Objaverse(XL), ABO, 3D-FUTURE, HSSD, and Toys4k, filtered based on aesthetic scores. Please refer to the dataset README for more details.
TRELLISโs training framework is organized to provide a flexible and modular approach to building and fine-tuning large-scale 3D generation models. The training code is centered around train.py
and is structured into several directories to clearly separate dataset handling, model components, training logic, and visualization utilities.
- train.py: Main entry point for training.
- trellis/datasets: Dataset loading and preprocessing.
- trellis/models: Different models and their components.
- trellis/modules: Custom modules for various models.
- trellis/pipelines: Inference pipelines for different models.
- trellis/renderers: Renderers for different 3D representations.
- trellis/representations: Different 3D representations.
- trellis/trainers: Training logic for different models.
- trellis/utils: Utility functions for training and visualization.
-
Prepare the Environment:
- Ensure all training dependencies are installed.
- Use a Linux system with an NVIDIA GPU (The models are trained on NVIDIA A100 GPUs).
- For distributed training, verify that your nodes can communicate through the designated master address and port.
-
Dataset Preparation:
- Organize your dataset similar to TRELLIS-500K. Specify your dataset path using the
--data_dir
argument when launching training.
- Organize your dataset similar to TRELLIS-500K. Specify your dataset path using the
-
Configuration Files:
- Training hyperparameters and model architectures are defined in configuration files under the
configs/
directory. - Example configuration files include:
- Training hyperparameters and model architectures are defined in configuration files under the
The training script can be run as follows:
usage: train.py [-h] --config CONFIG --output_dir OUTPUT_DIR [--load_dir LOAD_DIR] [--ckpt CKPT] [--data_dir DATA_DIR] [--auto_retry AUTO_RETRY] [--tryrun] [--profile] [--num_nodes NUM_NODES] [--node_rank NODE_RANK] [--num_gpus NUM_GPUS] [--master_addr MASTER_ADDR] [--master_port MASTER_PORT]
options:
-h, --help show this help message and exit
--config CONFIG Experiment config file
--output_dir OUTPUT_DIR Output directory
--load_dir LOAD_DIR Load directory, default to output_dir
--ckpt CKPT Checkpoint step to resume training, default to latest
--data_dir DATA_DIR Data directory
--auto_retry AUTO_RETRY Number of retries on error
--tryrun Try run without training
--profile Profile training
--num_nodes NUM_NODES Number of nodes
--node_rank NODE_RANK Node rank
--num_gpus NUM_GPUS Number of GPUs per node, default to all
--master_addr MASTER_ADDR Master address for distributed training
--master_port MASTER_PORT Port for distributed training
To train a image-to-3D stage 2 model with a single machine.
python train.py \
--config configs/vae/slat_vae_dec_mesh_swin8_B_64l8_fp16.json \
--output_dir outputs/slat_vae_dec_mesh_swin8_B_64l8_fp16_1node \
--data_dir /path/to/your/dataset1,/path/to/your/dataset2 \
The script will automatically distribute the training across all available GPUs. Specify the number of GPUs with the --num_gpus
flag if you want to limit the number of GPUs used.
To train a image-to-3D stage 2 model with multiple GPUs across nodes (e.g., 2 nodes):
python train.py \
--config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
--output_dir outputs/slat_flow_img_dit_L_64l8p2_fp16_2nodes \
--data_dir /path/to/your/dataset1,/path/to/your/dataset2 \
--num_nodes 2 \
--node_rank 0 \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT
Be sure to adjust node_rank
, master_addr
, and master_port
for each node accordingly.
By default, training will resume from the latest saved checkpoint in the same output directory. To specify a specific checkpoint to resume from, use the --load_dir
and --ckpt
flags:
python train.py \
--config configs/generation/slat_flow_img_dit_L_64l8p2_fp16.json \
--output_dir outputs/slat_flow_img_dit_L_64l8p2_fp16_resume \
--data_dir /path/to/your/dataset1,/path/to/your/dataset2 \
--load_dir /path/to/your/checkpoint \
--ckpt [step]
- Auto Retry: Use the
--auto_retry
flag to specify the number of retries in case of intermittent errors. - Dry Run: The
--tryrun
flag allows you to check your configuration and environment without launching full training. - Profiling: Enable profiling with the
--profile
flag to gain insights into training performance and diagnose bottlenecks.
Adjust the file paths and parameters to match your experimental setup.
TRELLIS models and the majority of the code are licensed under the MIT License. The following submodules may have different licenses:
-
diffoctreerast: We developed a CUDA-based real-time differentiable octree renderer for rendering radiance fields as part of this project. This renderer is derived from the diff-gaussian-rasterization project and is available under the LICENSE.
-
Modified Flexicubes: In this project, we used a modified version of Flexicubes to support vertex attributes. This modified version is licensed under the LICENSE.
If you find this work helpful, please consider citing our paper:
@article{xiang2024structured,
title = {Structured 3D Latents for Scalable and Versatile 3D Generation},
author = {Xiang, Jianfeng and Lv, Zelong and Xu, Sicheng and Deng, Yu and Wang, Ruicheng and Zhang, Bowen and Chen, Dong and Tong, Xin and Yang, Jiaolong},
journal = {arXiv preprint arXiv:2412.01506},
year = {2024}
}