Official Kotlin Multiplatform library for Cactus, a framework for deploying LLM and STT models locally in your app. Requires iOS 12.0+, Android API 24+.
dependencyResolutionManagement {
repositories {
maven {
name = "GitHubPackagesCactus"
url = uri("https://maven.pkg.github.com/cactus-compute/cactus-kotlin")
credentials {
username = properties.getProperty("github.username") ?: System.getenv("GITHUB_ACTOR")
password = properties.getProperty("github.token") ?: System.getenv("GITHUB_TOKEN")
}
}
}
}
Add your GitHub username and token to local.properties
:
github.username=your-username
github.token=your-personal-access-token
You can generate a personal access token by following the instructions on GitHub's documentation. The token needs read:packages
scope.
Or set them as environment variables: GITHUB_ACTOR
and GITHUB_TOKEN
.
kotlin {
sourceSets {
commonMain {
dependencies {
implementation("com.cactus:library:0.3-beta.4")
}
}
}
}
<uses-permission android:name="android.permission.INTERNET" /> // for model downloads
<uses-permission android:name="android.permission.RECORD_AUDIO" /> // for transcription
For Android, initialize the Cactus context in your Activity's onCreate()
method before using any SDK functionality:
import com.cactus.CactusContextInitializer
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// Initialize Cactus context (Android only - required)
CactusContextInitializer.initialize(this)
// ... rest of your code
}
}
import com.cactus.services.CactusTelemetry
// Initialize telemetry for usage analytics (optional)
CactusTelemetry.setTelemetryToken("your_token_here")
The CactusLM
class provides text completion capabilities with support for function calling (WIP).
import com.cactus.CactusLM
import com.cactus.CactusInitParams
import com.cactus.CactusCompletionParams
import com.cactus.ChatMessage
import kotlinx.coroutines.runBlocking
runBlocking {
val lm = CactusLM()
// Download a model (default: qwen3-0.6)
val downloadSuccess = lm.downloadModel("qwen3-0.6")
// Initialize the model
val initSuccess = lm.initializeModel(
CactusInitParams(
model = "qwen3-0.6",
contextSize = 2048
)
)
// Generate completion
val result = lm.generateCompletion(
messages = listOf(
ChatMessage(content = "Hello, how are you?", role = "user")
),
params = CactusCompletionParams(
maxTokens = 100,
temperature = 0.7,
topK = 40,
topP = 0.95
)
)
result?.let { response ->
if (response.success) {
println("Response: ${response.response}")
println("Tokens per second: ${response.tokensPerSecond}")
println("Time to first token: ${response.timeToFirstTokenMs}ms")
}
}
// Clean up
lm.unload()
}
val result = lm.generateCompletion(
messages = listOf(ChatMessage("Tell me a story", "user")),
params = CactusCompletionParams(maxTokens = 200),
onToken = { token, tokenId ->
print(token) // Print each token as it's generated
}
)
import com.cactus.models.ToolParameter
import com.cactus.models.createTool
val tools = listOf(
createTool(
name = "get_weather",
description = "Get current weather for a location",
parameters = mapOf(
"location" to ToolParameter(
type = "string",
description = "City name",
required = true
)
)
)
)
val result = lm.generateCompletion(
messages = listOf(ChatMessage("What's the weather in New York?", "user")),
params = CactusCompletionParams(
maxTokens = 100,
tools = tools
)
)
The generateCompletion
method supports different inference modes through the mode
parameter, which takes an InferenceMode
enum value. This allows you to control whether the completion is generated locally on the device or remotely using a compatible API.
InferenceMode.LOCAL
: (Default) Generates the completion using the local on-device model.InferenceMode.REMOTE
: Generates the completion using a remote API. Requires anapiKey
.InferenceMode.LOCAL_FIRST
: Attempts to generate the completion locally first. If it fails, it falls back to the remote API.InferenceMode.REMOTE_FIRST
: Attempts to generate the completion remotely first. If it fails, it falls back to the local on-device model.
Example using a remote-first strategy:
val result = lm.generateCompletion(
messages = listOf(ChatMessage("What's the weather in New York?", "user")),
params = CactusCompletionParams(
maxTokens = 100,
mode = InferenceMode.REMOTE_FIRST,
cactusToken = "your_cactus_token"
),
)
You can get a list of available models:
lm.getModels()
suspend fun downloadModel(model: String = "qwen3-0.6"): Boolean
- Download a modelsuspend fun initializeModel(params: CactusInitParams): Boolean
- Initialize model for inferencesuspend fun generateCompletion(messages: List<ChatMessage>, params: CactusCompletionParams, onToken: CactusStreamingCallback? = null): CactusCompletionResult?
- Generate text completion. Supports different inference modes (local, remote, and fallbacks).fun unload()
- Free model from memorysuspend fun getModels(): List<CactusModel>
- Get available LLM modelsfun isLoaded(): Boolean
- Check if model is loaded
CactusInitParams(model: String?, contextSize: Int?)
- Model initialization parametersCactusCompletionParams(temperature: Double, topK: Int, topP: Double, maxTokens: Int, stopSequences: List<String>, bufferSize: Int, tools: List<Tool>?, mode: InferenceMode, cactusToken: String)
- Completion parametersChatMessage(content: String, role: String, timestamp: Long?)
- Chat message formatCactusCompletionResult
- Contains response, timing metrics, and success statusCactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?)
- Embedding generation result
The CactusLM
class also provides text embedding generation capabilities for semantic similarity, search, and other NLP tasks.
import com.cactus.CactusLM
import com.cactus.CactusInitParams
import kotlinx.coroutines.runBlocking
runBlocking {
val lm = CactusLM()
// Download and initialize a model (same as for completions)
lm.downloadModel("qwen3-0.6")
lm.initializeModel(CactusInitParams(model = "qwen3-0.6", contextSize = 2048))
// Generate embeddings for a text
val result = lm.generateEmbedding(
text = "This is a sample text for embedding generation",
bufferSize = 2048
)
result?.let { embedding ->
if (embedding.success) {
println("Embedding dimension: ${embedding.dimension}")
println("Embedding vector length: ${embedding.embeddings.size}")
} else {
println("Embedding generation failed: ${embedding.errorMessage}")
}
}
lm.unload()
}
suspend fun generateEmbedding(text: String, bufferSize: Int = 2048): CactusEmbeddingResult?
- Generate text embeddings
CactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?)
- Contains the generated embedding vector and metadata
The CactusSTT
class provides speech recognition capabilities using on-device models from providers like Vosk and Whisper.
You can select a transcription provider when initializing CactusSTT
. The available providers are:
TranscriptionProvider.VOSK
(Default): Uses Vosk for transcription.TranscriptionProvider.WHISPER
: Uses Whisper for transcription.
import com.cactus.CactusSTT
import com.cactus.TranscriptionProvider
// Initialize with the VOSK provider (default)
val sttVosk = CactusSTT()
// Or explicitly initialize with the WHISPER provider
val sttWhisper = CactusSTT(TranscriptionProvider.WHISPER)
import com.cactus.CactusSTT
import com.cactus.SpeechRecognitionParams
import kotlinx.coroutines.runBlocking
runBlocking {
val stt = CactusSTT() // Defaults to VOSK provider
// Download STT model (e.g., vosk-en-us)
val downloadSuccess = stt.download("vosk-en-us")
// Initialize the model
val initSuccess = stt.init("vosk-en-us")
// Transcribe from microphone
val result = stt.transcribe(
SpeechRecognitionParams(
maxSilenceDuration = 1000L,
maxDuration = 30000L,
sampleRate = 16000
)
)
result?.let { transcription ->
if (transcription.success) {
println("Transcribed: ${transcription.text}")
println("Processing time: ${transcription.processingTime}ms")
}
}
// Stop transcription
stt.stop()
}
import com.cactus.CactusSTT
import com.cactus.SpeechRecognitionParams
import com.cactus.TranscriptionProvider
import kotlinx.coroutines.runBlocking
runBlocking {
val stt = CactusSTT(TranscriptionProvider.WHISPER)
// Download a Whisper model (e.g., whisper-tiny)
val downloadSuccess = stt.download("whisper-tiny")
// Initialize the model
val initSuccess = stt.init("whisper-tiny")
// Transcribe from an audio file
val fileResult = stt.transcribe(
params = SpeechRecognitionParams(),
filePath = "/path/to/audio.wav"
)
fileResult?.let { transcription ->
if (transcription.success) {
println("Transcribed: ${transcription.text}")
}
}
// Stop transcription
stt.stop()
}
CactusSTT
supports multiple transcription modes for flexibility between on-device and cloud-based processing. This is controlled by the mode
parameter in the transcribe
function.
TranscriptionMode.LOCAL
: (Default) Performs transcription locally on the device.TranscriptionMode.REMOTE
: Performs transcription using a remote API (e.g., Wispr). RequiresfilePath
andapiKey
.TranscriptionMode.LOCAL_FIRST
: Attempts local transcription first. If it fails, it falls back to the remote API.TranscriptionMode.REMOTE_FIRST
: Attempts remote transcription first. If it fails, it falls back to the local model.
Example using local-first fallback for a file:
// Transcribe from audio file with remote fallback
val fileResult = stt.transcribe(
params = SpeechRecognitionParams(),
filePath = "/path/to/audio.wav",
mode = TranscriptionMode.LOCAL_FIRST,
apiKey = "your_wispr_api_key"
)
You can get a list of available models for the configured provider.
// For VOSK (default)
val voskModels = CactusSTT().getVoiceModels()
// For WHISPER
val whisperModels = CactusSTT().getVoiceModels(TranscriptionProvider.WHISPER)
// Check if a model is downloaded
stt.isModelDownloaded("vosk-en-us")
CactusSTT(provider: TranscriptionProvider = TranscriptionProvider.VOSK)
- Constructor to specify the transcription provider.suspend fun download(model: String): Boolean
- Download an STT model (e.g., "vosk-en-us" or "whisper-tiny-en").suspend fun init(model: String): Boolean
- Initialize an STT model for transcription.suspend fun transcribe(params: SpeechRecognitionParams = SpeechRecognitionParams(), filePath: String? = null, mode: TranscriptionMode = TranscriptionMode.LOCAL, apiKey: String? = null): SpeechRecognitionResult?
- Transcribe speech from microphone or file. Supports different transcription modes.suspend fun warmUpWispr(apiKey: String)
- Warms up the remote Wispr service for lower latency.fun stop()
- Stop ongoing transcription.fun isReady(): Boolean
- Check if the STT service is initialized and ready.suspend fun getVoiceModels(provider: TranscriptionProvider = TranscriptionProvider.VOSK): List<VoiceModel>
- Get a list of available voice models for the configured provider.suspend fun isModelDownloaded(modelName: String): Boolean
- Check if a specific model has been downloaded.
TranscriptionProvider
- Enum for selecting the provider (VOSK
,WHISPER
).SpeechRecognitionParams(maxSilenceDuration: Long, maxDuration: Long, sampleRate: Int)
- Parameters for controlling speech recognition.SpeechRecognitionResult(success: Boolean, text: String?, processingTime: Double?)
- The result of a transcription.VoiceModel
- Contains information about an available voice model.
- Works automatically - native libraries included
- Requires API 24+ (Android 7.0)
- ARM64 architecture supported
- Add the Cactus package dependency in Xcode
- Requires iOS 12.0+
- Supports ARM64 and Simulator ARM64
To build the library from source:
# Build the library and publish to localMaven
./build_library.sh
Navigate to the example app and run it:
cd kotlin/example
# For desktop
./gradlew :composeApp:run
# For Android/iOS - use Android Studio or Xcode
The example app demonstrates:
- Model downloading and initialization
- Text completion with streaming
- Function calling
- Speech-to-text transcription
- Error handling and status management
- Model Selection: Choose smaller models for faster inference on mobile devices
- Context Size: Reduce context size for lower memory usage
- Memory Management: Always call
unload()
when done with models - Batch Processing: Reuse initialized models for multiple completions
- π Documentation
- π¬ Discord Community
- π Issues
- π€ Models on Hugging Face