Skip to content

dalisoft/hf-dataset-distill

Repository files navigation

hf-dataset-distill

An project to help with distillation of models to get dataset with Batch API (50% discount)

Features

  • Easy to use commands
  • JS/TS Stack
  • Hugging Face compatible

Prerequisites

  • Bun installed
  • AI Provider API Key
  • A money in AI Provider balance
  • Source dataset

Commands

Installation

bun install

Running

Development:

bun run dev

Production:

bun run start

Build dataset

bun run export

An file should be ready at dataset/

Dataset stat

bun run stat

Source dataset

PROMPT:

Generate a dataset for different/mixed purposes with focus on programming language.

Most of dataset should be:
- Rust
- Golang
- TypeScript/JavaScript/Node.js
- Also other programming languages

should be included.

Most output should be simpler, not verbose or expanded.
Only last 15% rows should be verbose with expanded details.

Output dataset entries: ~2000 rows

License

Apache-2.0

About

An project to help with distillation of models to get dataset with Batch API (50% discount)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors