- Wellington, NZ
- https://drib.net
- @dribnet
Highlights
- Pro
Stars
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Derivative-Free Guidance in Diffusion Models with Soft Value-Based Decoding. For controlled generation in DNA, RNA, proteins, molecules (+ images)
Derivative-Free, Training-Free, Guidance in Diffusion Models
A python module that uses PIL/Pillow to give images a halftone effect
Testing prompts with SDXL
Sparsify transformers with SAEs and transcoders
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A Python package for analyzing and transforming neural latent spaces.
Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
A SciPy implementation of the void-and-cluster method for generation of blue noise textures with arbitrary dimension.
Official implementation & data for paper "Strong and Precise Modulation of Human Percepts via Robustified ANNs" (NeurIPS 2023)
Explore and interpret large embeddings in your browser with interactive visualization! 📍
Swift app demonstrating Core ML Stable Diffusion
Code repository for Understanding Game-Playing Agents with Natural Language Annotations
Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.
geotiff.js is a small library to parse TIFF files for visualization or analysis. It is written in pure JavaScript, and is usable in both the browser and node.js applications.
A javascript library for interpolating 2D scalar fields/ 3D surfaces.
Optimizable stack of images at different resolutions, a useful representation of images for deep learning tasks. Docs: https://johnowhitaker.github.io/imstack/
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference, and more.
Research code for pixel-based encoders of language (PIXEL)
Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)