Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 31 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,28 +458,48 @@ The promise returned by `append()` will reject if the prompt cannot be appended

Note that `append()` can also cause [overflow](#tokenization-context-window-length-limits-and-overflow), in which case it will evict the oldest non-system prompts from the session and fire the `"contextoverflow"` event.

### Configuration of per-session parameters
### Configuration of sampling modes

Tuning language model sampling parameters can be useful for both testing and adjusting task-specific model behavior. Common sampling parameters include [temperature](https://huggingface.co/blog/how-to-generate#sampling) and [topK](https://huggingface.co/blog/how-to-generate#top-k-sampling).
For standard web page contexts, developers can specify a high-level `samplingMode` during session creation to configure the model's output variety and creativity without worrying about model-internal scalar parameters.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the "For standard web page contexts" since this is experimentally available for web and extensions.

Suggested change
For standard web page contexts, developers can specify a high-level `samplingMode` during session creation to configure the model's output variety and creativity without worrying about model-internal scalar parameters.
Developers can specify a high-level `samplingMode` during session creation to configure the model's output variety and creativity without worrying about model-internal scalar parameters.


**Notice:** Sampling parameter features are currently only available within extension and experimental contexts. While they are useful for exploring model behavior, the current fields are not guaranteed to be supported or interpreted consistently across all models or user agents.
The allowed values for `samplingMode` are:
* `"most-predictable"`: For tasks requiring strict consistency and reproducibility (e.g., code generation or factual extraction).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mention testing, replace factual with content.

Suggested change
* `"most-predictable"`: For tasks requiring strict consistency and reproducibility (e.g., code generation or factual extraction).
* `"most-predictable"`: For tasks requiring strict consistency and reproducibility (e.g., testing, code generation, or content extraction).

* `"predictable"`: For focused outputs with minimal variation.
* `"balanced"` (default): The standard preset for most conversational interactions.
* `"creative"`: For tasks where variety and creativity are preferred over strict factual reproducibility.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `"creative"`: For tasks where variety and creativity are preferred over strict factual reproducibility.
* `"creative"`: For tasks where variety and creativity are preferred over strict reproducibility.

* `"most-creative"`: For maximum diversity of tokens and creative brainstorming.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `"most-creative"`: For maximum diversity of tokens and creative brainstorming.
* `"most-creative"`: For maximum diversity of output and creative brainstorming.


_The limited applicability and non-universal nature of these sampling hyperparameters are discussed further in [issue #42](https://github.com/webmachinelearning/prompt-api/issues/42): sampling hyperparameters are not universal among models._
Example:
```js
const creativeSession = await LanguageModel.create({
samplingMode: "creative"
});
console.log(creativeSession.samplingMode); // "creative"
```

The resolved `samplingMode` used to create the session is exposed as a read-only attribute on the session object.

### Legacy: Configuration of per-session raw parameters

**Deprecation Notice:** The `topK` and `temperature` options for `LanguageModel.create()`, the `LanguageModel.params()` static method, and the `languageModel.topK` and `languageModel.temperature` instance attributes are now **deprecated**. These features are only functional within web extension contexts and will be ignored in standard web page contexts. They may be completely removed in a future release.

To avoid breaking existing pages, standard web page contexts can still pass `topK` and `temperature` in the options object without throwing an error (a deprecation warning will be logged in the console), but they are ignored at runtime and the corresponding properties on the session object will be `undefined` (or fallback to default values).

In extension and experimental contexts:
* The `LanguageModel.params()` static method provides default and maximum values for temperature and topK parameters, once the user agent has ascertained or downloaded the specific underlying model.
* The `temperature` and `topK` instance attributes provide the current values for these parameters for a given session.
* Sampling parameters can also be configured at session creation time via the `temperature` and `topK` options for `LanguageModel.create()`
Furthermore, in contexts where raw parameters are supported (e.g. Web Extensions), passing both `samplingMode` and a raw parameter (`topK` or `temperature`) will reject the `create()` promise with a `TypeError`.

The `LanguageModel.params()` API, only available in extensions, can be used to query the default and maximum values for these parameters.

_The limited applicability and non-universal nature of these sampling hyperparameters are discussed further in [issue #42](https://github.com/webmachinelearning/prompt-api/issues/42) and [issue #203](https://github.com/webmachinelearning/prompt-api/issues/203)._

```js
// Sampling parameter support is limited to extension and experimental web contexts.
// Accessors are undefined, and options are ignored, outside of those contexts.
// The topK and temperature members of the options object are deprecated. They will only be considered when
// LanguageModel.create() is called from within a web extension. In web page contexts, they are ignored.
const customSession = await LanguageModel.create({
temperature: 0.8,
topK: 10
});
// This interface and all its attributes (`defaultTopK`, `maxTopK`, `defaultTemperature`, `maxTemperature`)
// are now only available within web extension contexts. Web pages can no longer call this method.
const params = await LanguageModel.params();
const conditionalSession = await LanguageModel.create({
temperature: isCreativeTask ? params.defaultTemperature * 1.1 : params.defaultTemperature * 0.8,
Expand All @@ -489,7 +509,7 @@ const conditionalSession = await LanguageModel.create({

If the language model is not available at all in this browser, `params()` will fulfill with `null`.

Error-handling behavior:
Error-handling behavior (only applicable in contexts where legacy parameters are active, e.g. Web Extensions):

* If values below 0 are passed for `temperature`, then `create()` will return a promise rejected with a `RangeError`.
* If values above `maxTemperature` are passed for `temperature`, then `create()` will clamp to `maxTemperature`. (`+Infinity` is specifically allowed, as a way of requesting maximum temperature.)
Expand Down
6 changes: 6 additions & 0 deletions index.bs
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,8 @@ interface LanguageModel : EventTarget {
// **EXPERIMENTAL**: Only available in extension and experimental contexts.
readonly attribute float temperature;

readonly attribute LanguageModelSamplingMode samplingMode;

Promise<LanguageModel> clone(optional LanguageModelCloneOptions options = {});
};
LanguageModel includes DestroyableModel;
Expand Down Expand Up @@ -119,6 +121,8 @@ dictionary LanguageModelCreateCoreOptions {
// **EXPERIMENTAL**: Only available in extension and experimental contexts.
unrestricted double temperature;

LanguageModelSamplingMode samplingMode = "default";

sequence<LanguageModelExpected> expectedInputs;
sequence<LanguageModelExpected> expectedOutputs;
sequence<LanguageModelTool> tools;
Expand Down Expand Up @@ -172,6 +176,8 @@ dictionary LanguageModelMessageContent {
required LanguageModelMessageValue value;
};

enum LanguageModelSamplingMode { "most-predictable", "predictable", "balanced", "creative", "most-creative" };

enum LanguageModelMessageRole { "system", "user", "assistant" };

enum LanguageModelMessageType { "text", "image", "audio", "tool-call", "tool-response" };
Expand Down
Loading