A large-scale collection of real-world transactional prompts extracted from GitHub repositories, accompanied by a comprehensive ontology, rich structural and semantic annotations, and an interactive browser for systematic exploration.
This repository accompanies the paper:
Prompts in the Wild: A Large Analyzed Collection of Transactional Prompts in Code
The 20th Linguistic Annotation Workshop (LAW XX), ACL 2026
📄 Paper: Read the paper
🌐 Interactive Browser: Explore the collection
📦 Dataset: (Coming soon)
This project presents a large collection of real-world transactional prompts extracted from open-source GitHub repositories together with a rich ontology describing both their semantic properties (such as task, domain, prompting techniques, languages) and their structural components (e.g., instruction blocks, input context and question, output specifications).
The repository also includes an interactive user interface that allows researchers to browse, search, filter and inspect prompts using the proposed ontology.
- 57,000+ transactional prompts extracted from real GitHub repositories
- Prompt-level metadata including repository information and timestamps
- Rich semantic annotations including:
- task
- domain
- language
- modality
- prompting techniques
- instruction blocks
- and more
- Search and filtering over the prompt collection
- Downloadable filtered prompt subsets
Explore the collection directly in your browser:
The browser allows users to:
- search and filter prompts using ontology attributes
- inspect prompt structure
- highlight structural spans
- download filtered subsets
| Statistic | Value |
|---|---|
| Prompts | 57,640 |
| GitHub repositories | 34,249 |
| GitHub files processed | 53,630 |
For a complete analysis, see the accompanying paper.
The ontology captures multiple properties and components of transactional prompts, including:
- Task
- Domain
- Modality
- Languages
- Input and output specifications
- Prompting techniques
- Instruction blocks
A complete description of the ontology is available in the paper.
Work on this project was supported by a VATAT grant from the Planning and Budgeting Committee of the Council for Higher Education in Israel, Kamin grant by the Israel Innovation Authority (IIA) and ISF grant number 670/23.
