Smoltalk exposes a common API to different LLM providers. There are other packages that do this, but Smoltalk allows you to build strategies on top of it. Here is a simple example.
pnpm install smoltalk
import { text, userMessage } from "smoltalk";
async function main() {
const messages = [userMessage("Write me a 10 word story.")];
const response = await text({
messages,
model: "gpt-5.4",
});
console.log(response);
}
main();
This is functionality that other packages allow.
{
success: true,
value: {
output: 'Clock stopped; everyone smiled as tomorrow finally arrived before yesterday.',
toolCalls: [],
usage: {
inputTokens: 14,
outputTokens: 15,
cachedInputTokens: 0,
totalTokens: 29
},
cost: {
inputCost: 0.000035,
outputCost: 0.000225,
cachedInputCost: undefined,
totalCost: 0.00026,
currency: 'USD'
},
model: 'gpt-5.4'
}
}
What if you wanted to have fallbacks in case the OpenAI API was
down? Just change the model field:
const response = await text({
messages,
model: fallback("gpt-5.4", "gemini-2.5-flash-lite"),
// or multiple fallbacks:
// model: fallback("gpt-5.4", ["gemini-2.5-flash-lite", "gemini-3-flash-preview"]),
});
Or what if you wanted to try a couple of models and take the first response?
const response = await text({
messages,
model: race("gpt-5.4", "gemini-2.5-flash-lite", "o4-mini"),
});
Or combine them:
const response = await text({
messages,
model: race(fallback("gpt-5.4", "gemini-2.5-flash-lite"), "o4-mini"),
});
You get the idea.
To use Smoltak, you first create a client:
import { getClient } from "smoltalk";
const client = getClient({
openAiApiKey: process.env.OPENAI_API_KEY || "",
googleApiKey: process.env.GEMINI_API_KEY || "",
logLevel: "debug",
model: "gemini-2.0-flash-lite",
});
Then you can call different methods on the client. The simplest is
prompt:
const resp = await client.prompt("Hello, how are you?");
If you want tool calling, structured output, etc.,
text may be a cleaner option:
let messages: Message[] = [];
messages.push(
userMessage(
"Please use the add function to add the following numbers: 3 and 5"
)
);
const resp = await client.text({
messages,
});
Here is an example with tool calling:
function add({ a, b }: { a: number; b: number }): number {
return a + b;
}
const addTool = {
name: "add",
description: "Adds two numbers together and returns the result.",
schema: z.object({
a: z.number().describe("The first number to add"),
b: z.number().describe("The second number to add"),
}),
};
const resp = await client.text({
messages,
tools: [addTool]
});
Here is an example with structured output:
const resp = await client.text({
messages,
responseFormat: z.object({
result: z.number(),
});
});
A couple of design decisions to note:
SmolPromptConfig is the union of client config
(SmolConfig) and per-request config
(PromptConfig). You can pass all options together to
text(), or split them between
getClient() and individual calls.
SmolConfig)
| Option | Type | Description |
|---|---|---|
model |
ModelName | ModelConfig |
Required. The model to use (e.g.
"gpt-4o",
"gemini-2.0-flash-lite").
|
openAiApiKey |
string |
OpenAI API key. |
googleApiKey |
string |
Google Gemini API key. |
ollamaApiKey |
string |
Ollama API key (only needed for cloud Ollama). |
ollamaHost |
string |
Ollama host URL (for self-hosted or cloud Ollama). |
provider |
Provider |
Override provider detection. One of
"openai",
"openai-responses",
"google",
"ollama",
"anthropic",
"replicate",
"modal",
"local".
|
logLevel |
LogLevel |
Logging verbosity: "debug",
"info", "warn",
"error", etc.
|
toolLoopDetection |
ToolLoopDetection |
Config to detect and break tool call loops. See below. |
PromptConfig)
| Option | Type | Description |
|---|---|---|
messages |
Message[] |
Required. The conversation messages to send. |
instructions |
string |
System-level instructions (system prompt). |
tools |
{ name, description?, schema }[] |
Tool definitions. schema is a Zod object schema.
|
responseFormat |
ZodType |
Zod schema for structured output. The response will be parsed and validated against this schema. |
responseFormatOptions |
object |
Fine-grained control over structured output (see below). |
maxTokens |
number |
Maximum number of output tokens to generate. |
temperature |
number |
Sampling temperature (0–2 for most providers). |
numSuggestions |
number |
Number of completions to generate. |
parallelToolCalls |
boolean |
Whether to allow the model to call multiple tools in parallel. |
stream |
boolean |
If true, returns an
AsyncGenerator<StreamChunk> instead of a
Promise.
|
maxMessages |
number |
If the message list exceeds this count, returns a failure instead of calling the API. |
rawAttributes |
Record<string, any> |
Pass provider-specific attributes directly to the API request. |
responseFormatOptions
Used with responseFormat to control validation behavior
(currently OpenAI only).
| Option | Type | Default | Description |
|---|---|---|---|
name |
string |
Name for the response format schema. | |
strict |
boolean |
Whether to use strict schema validation. | |
numRetries |
number |
2 |
How many times to retry if the response fails schema validation. |
allowExtraKeys |
boolean |
If true, strips unexpected keys instead of
failing validation.
|
toolLoopDetection
Detects when the model is stuck in a repetitive tool-call loop.
| Option | Type | Description |
|---|---|---|
enabled |
boolean |
Whether loop detection is active. |
maxConsecutive |
number |
Number of consecutive identical tool calls before triggering intervention. |
intervention |
string |
Action to take: "remove-tool",
"remove-all-tools",
"throw-error", or
"halt-execution".
|
excludeTools |
string[] |
Tool names to ignore when counting consecutive calls. |
Smoltalk has support for a limited number of providers right now, and is mostly focused on the stateless APIs for text completion, though I plan to add support for more providers as well as image and speech models later. Smoltalk is also a personal project, and there are alternatives backed by companies:
Contributions are welcome. Any of the following contributions would be helpful: