feat(chat): User-editable Custom Instructions (system prompt) delivered safely in the user turn#22
Conversation
…m prompt) Adds a "Custom Instructions" field (Settings modal) so learners can shape Cosmo's tone, style, persona, and expertise — the system-prompt practice the prompt-engineering course teaches. Why deliver them in the user turn instead of the system prompt: An earlier approach injected the user's text into the system prompt under a "HIGHEST PRIORITY — follow exactly and without exception" section, placed ABOVE the guardrails. That is a prompt-injection / jailbreak hole: a learner could type "ignore all previous instructions and guardrails..." and it would carry system-level authority over Cosmo's own safety rules and persona. LLMs treat system > user authority by construction. By delivering untrusted, user-typed instructions in the USER turn (wrapped with a note that they are user-level preferences, subordinate to the system prompt), the real system prompt and guardrails always outrank them structurally — instead of relying on a self-contradictory "this is highest priority but also obey the guardrails" wording. Bonus: instructions now ride with each message, so they apply to the next reply without forcing a new chat. Trusted vs untrusted split: - chat-config.json `systemPromptExtra` stays in the system prompt as trusted, course-author configuration (EXTRA_INSTRUCTIONS), but is now explicitly below the guardrails (defense in depth). - The in-app Custom Instructions field is untrusted learner input and is sent only in the user turn (CUSTOM_INSTRUCTIONS), never the system prompt. Changes: - protocol.yaml: add optional CUSTOM_INSTRUCTIONS to the user-message trigger; uiContent stays USER_MESSAGE so the rendered message is unaffected. - prompts/user-message.md: render CUSTOM_INSTRUCTIONS with an explicit precedence/refusal note. - prompts/system.md: replace the "highest priority" extra-instructions section with an absolute priority order (guardrails > trusted config > user custom instructions) and instruct refusal of override attempts. - lib/helpers.js: EXTRA_INSTRUCTIONS now comes only from trusted config. - public/app.js: Custom Instructions textarea in Settings; send the value with each message (initial send + regenerate/edit resend); no session-creation plumbing. - public/app.css: textarea styling. Co-authored-by: Cursor <cursoragent@cursor.com>
…ions flag Custom Instructions is an advanced-course feature and should not surface in practices that don't use it. Hide the field everywhere by default; a practice opts in by setting `showCustomInstructions: true` in chat-config.json. - public/app.js: only render the Custom Instructions settings section when chatConfig.showCustomInstructions is true. While hidden, selectedCustomInstructions stays empty, so CUSTOM_INSTRUCTIONS is never added to the trigger payload. - README.md: document showCustomInstructions and clarify that systemPromptExtra is trusted, course-author config that still ranks below Cosmo's guardrails. Defense in depth: the field also lives inside the Settings modal (so hideSettings already hides it) and is never persisted (it resets to empty on each page load), so previous chats are never affected retroactively. Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughA new 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@public/app.js`:
- Around line 471-476: The heading "System prompt" in the h3 element with class
settings-section__title is misleading because the customInstructionsEl textarea
field is for user-turn input, not system-level prompts. Change the heading text
from "System prompt" to "Custom Instructions" or "Response preferences" to
accurately reflect that this field contains user customizations rather than
system-level configuration, which will prevent confusion about authority and
override behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 47ffe91d-95b0-4655-b9a0-07ff25986960
📒 Files selected for processing (7)
README.mdagents/cosmo-tutor/prompts/system.mdagents/cosmo-tutor/prompts/user-message.mdagents/cosmo-tutor/protocol.yamllib/helpers.jspublic/app.csspublic/app.js
| <h3 class="label-small settings-section__title">System prompt</h3> | ||
|
|
||
| <div class="settings-row"> | ||
| <label class="body-small settings-row__label" for="customInstructionsEl">Custom Instructions</label> | ||
| <p class="body-xsmall settings-row__desc">Shape Cosmo's tone, style, persona, or expertise. Sent with each message and applied to your next reply. Cosmo's core guidelines and safety guardrails always take precedence.</p> | ||
| <textarea id="customInstructionsEl" class="settings-textarea body-small" rows="5" placeholder="e.g. You are an expert in Data Science with an IQ of 159. Maintain a positive, helpful style."></textarea> |
There was a problem hiding this comment.
Rename the settings section title to avoid trust-boundary confusion.
The heading says “System prompt”, but this field is sent as user-turn input. That label can mislead learners about authority and override behavior. Use wording like “Custom Instructions” or “Response preferences” instead.
Suggested patch
- <h3 class="label-small settings-section__title">System prompt</h3>
+ <h3 class="label-small settings-section__title">Custom Instructions</h3>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@public/app.js` around lines 471 - 476, The heading "System prompt" in the h3
element with class settings-section__title is misleading because the
customInstructionsEl textarea field is for user-turn input, not system-level
prompts. Change the heading text from "System prompt" to "Custom Instructions"
or "Response preferences" to accurately reflect that this field contains user
customizations rather than system-level configuration, which will prevent
confusion about authority and override behavior.
Summary
Adds a user-facing Custom Instructions field so learners can shape Cosmo's tone, style, persona, and expertise — the "system prompt" practice taught in the Advanced Prompt Engineering course.
The key design decision: the user's text is delivered in the user turn, not the system prompt, so it can never override Cosmo's guardrails. It is also opt-in (hidden by default), so it only appears in practices that explicitly enable it.
Why deliver instructions in the user turn (security rationale)
An obvious-but-unsafe approach would inject the learner's text into the system prompt at "highest priority", above the guardrails. That is a prompt-injection / jailbreak hole: a learner could type "ignore all previous instructions and guardrails…" and it would carry system-level authority over Cosmo's own safety rules.
LLMs treat system > user authority by construction. By delivering untrusted, user-typed instructions in the user turn (wrapped with a note that they are user-level preferences, subordinate to the system prompt), the real system prompt and guardrails outrank them structurally — rather than relying on self-contradictory "this is highest priority but also obey the guardrails" wording. Bonus: instructions ride with each message, so they apply to the next reply without forcing a new chat.
Trusted vs untrusted split
chat-config.jsonsystemPromptExtrastays in the system prompt as trusted, course-author configuration, but now explicitly ranks below the guardrails (defense in depth).CUSTOM_INSTRUCTIONS), never in the system prompt.Opt-in gating
"showCustomInstructions": trueinchat-config.json.hideSettings: truealready hides it) and is never persisted — it resets to empty on each page load, so previous chats are never affected retroactively.Changes
Agent definition
agents/cosmo-tutor/protocol.yaml— add optionalCUSTOM_INSTRUCTIONSto theuser-messagetrigger and handler input.uiContentstaysUSER_MESSAGE, so the rendered chat bubble is unchanged.agents/cosmo-tutor/prompts/user-message.md— renderCUSTOM_INSTRUCTIONSwith an explicit precedence/refusal note.agents/cosmo-tutor/prompts/system.md— replace the old "HIGHEST PRIORITY" extra-instructions section with an absolute priority order: guardrails > trusted config > user custom instructions, and instruct refusal of override attempts.Backend
lib/helpers.js—EXTRA_INSTRUCTIONSnow comes only from trustedsystemPromptExtra(comment clarifies the trust boundary).Frontend
public/app.js— Custom Instructions textarea in Settings (gated byshowCustomInstructions); send the value with each message (initial send + regenerate/edit resend); no session-creation plumbing.public/app.css— textarea styling.Docs
README.md— documentshowCustomInstructionsand clarifysystemPromptExtrais trusted config that still ranks below the guardrails.Test plan
"showCustomInstructions": trueinchat-config.json, reload → Custom Instructions field appears in Settings.showCustomInstructions: false, reload → field disappears; noCUSTOM_INSTRUCTIONSis sent.npm test→ 100/100 pass; bundle builds clean.Deployment note
The agent prompt/protocol changed. The dev agent (
cosmo-tutor-dev) is already synced. Prod is not — runnode scripts/deploy-agent.mjs prodafter merge for the behavior to reach real users.Made with Cursor