You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LoRA extraction fixes (#522)
Addresses #521.
Also adds:
* `--lora-merge-dtype` to specify dtype to use when applying LoRA adapters to models
* `--gpu-rich` alias for convenience
* Organize display of options in `--help`
Compute-graph based `mergekit-extract-lora` (#505)
Now with better embedding handling, multi-gpu execution, and lazy
loading/saving of tensors.
When extracting a LoRA from an 8B model, execution time goes from ~6
minutes down to 40 seconds with `--cuda --multi-gpu` on an 8-GPU
machine.
Additionally, the `--sv-epsilon` flag can be used to set a tolerance for
singular values to opportunistically reduce rank when the fine tuned
difference is inherently lower rank.
Also reimplement a couple of merge methods using the `@easy_define`
decorator and add some missing tests.