Skip to content

Define the declarative input-schema synthesis algorithm (#declarative-api is currently a TODO) #210

Description

@shalugeoz

Summary

The declarative API needs a deterministic way to turn a <form> and its form-associated elements into a JSON Schema inputSchema, so agents can reliably fill out and submit declarative tools. Today this is unspecified in two places:

  • In the spec, #declarative-api — the synthesize a declarative JSON Schema object algorithm is just 1. TODO: Derive a conformant JSON Schema object from |form| and its form-associated elements.
  • In the explainer, the "Input schema synthesis" section is also marked TBD, noting we need to specify how elements like <input> and <select> reduce to a JSON Schema.

I've joined the Web Machine Learning Community Group and would like to contribute a concrete proposal for this algorithm. I've also written a reference implementation that passes 11 test cases (details at the bottom). Sharing here for feedback before opening any PRs. It's intentionally conservative and aligned with the attribute model already in the explainer (toolname, tooldescription, name, toolparamdescription).

Proposed top-level shape

A <form> reduces to:

{
  "type": "object",
  "properties": { /* one entry per included, named control */ },
  "required": [ /* names of controls carrying the `required` constraint */ ]
}
  • Only named controls are included; submit/reset/button/image and unnamed/disabled controls are skipped.
  • Property key = the control's name; description comes from toolparamdescription; a control's value maps to default.
  • Controls sharing one name (radio groups, multi-checkboxes) are merged into a single property.

Proposed control → schema mapping

Control JSON Schema Constraints
text / search / tel / password {"type":"string"} minlengthminLength, maxlengthmaxLength, patternpattern
email {"type":"string","format":"email"} multiple → array of email strings
url {"type":"string","format":"uri"}
number {"type":"number"} minminimum, maxmaximum, numeric stepmultipleOf (omit for any)
range {"type":"number"} minimum default 0, maximum default 100
date / datetime-local / time {"type":"string","format":"date" | "date-time" | "time"}
month / week / color {"type":"string","pattern": …} no standard format exists
checkbox (single) {"type":"boolean"}
radio group {"type":"string","enum":[values]} required if any member is required
checkbox group {"type":"array","items":{"enum":[values]},"uniqueItems":true}
<select> / <select multiple> {"type":"string","enum":[…]} / {"type":"array",…} option text used when value absent
<textarea> {"type":"string"} minlength/maxlength

Algorithm (sketch)

Two algorithms, mirroring the split of concerns:

  1. synthesize a declarative JSON Schema object (form-level): collect submittable named controls, group them by name, build properties and required.
  2. map a form control to a JSON Schema property (control-level): switch on the control's type per the table above, applying string/numeric constraint helpers.

I have these written out in full Bikeshed-style steps and am happy to paste them here or bring them directly in a PR, whichever the editors prefer.

Worked example

<form toolname="search-cars" tooldescription="Perform a car make/model search">
  <input type=text name="make" toolparamdescription="The vehicle's make" required>
  <input type=text name="model" toolparamdescription="The vehicle's model" required>
  <input type=number name="max_price" min="0" max="200000" step="500">
  <select name="fuel"><option>Petrol<option>Diesel<option value="ev">Electric</select>
</form>

synthesizes to:

{
  "type": "object",
  "properties": {
    "make":      { "type": "string", "description": "The vehicle's make" },
    "model":     { "type": "string", "description": "The vehicle's model" },
    "max_price": { "type": "number", "minimum": 0, "maximum": 200000, "multipleOf": 500 },
    "fuel":      { "type": "string", "enum": ["Petrol", "Diesel", "ev"] }
  },
  "required": ["make", "model"]
}

Open questions (would like editor guidance)

  1. <input type=file> — binary uploads; defer until multimodal tool inputs (#41, #86) are resolved?
  2. <input type=hidden> — exclude (not agent-fillable), or expose as a fixed const?
  3. disabled / readonly — exclude disabled (not submitted); what about readonly?
  4. <datalist> — non-restrictive suggestions; map to examples rather than enum?
  5. step + min — strictly a discrete value set; is multipleOf an acceptable approximation, or should small ranges emit an enum?
  6. month / week — use an explicit regex pattern, or reference HTML's "valid month/week string" microsyntax via a note?
  7. format as assertion — JSON Schema treats format as annotation by default; should the spec require agents to treat email/uri/date as assertions?

Reference implementation & tests

To validate the design I wrote a small reference implementation of both algorithms and a test suite (11 cases: the example above, string constraints, all format types, email multiple, range defaults, single checkbox, radio group, checkbox group, select multiple, excluded/hidden/file/disabled controls, and textarea/color/month). All 11 pass. I'm glad to share the code, and — if useful — to contribute these as Web Platform Tests so the feature is testable end-to-end.

Proposed next steps

If this direction looks reasonable, I'd open two small PRs once the open questions settle:

  1. declarative-api-explainer.md — the mapping table + worked examples.
  2. index.bs — the two algorithms replacing the 1. TODO.

Feedback very welcome, especially on the open questions above. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions