⚡ Latent Reasoning Guidance for Parallel Code Translation

Train a Process Reward Model, use it to steer large-model code translation, and validate the resulting parallel programs on HeCBench.

📋 Overview

This repository contains the full research pipeline for latent-guided parallel code translation. We train a Process Reward Model (PRM) on validator-derived reward signals, uses that PRM to choose among multiple latent or code candidates during large-model inference, and then validates the generated translations with the HeCBench toolchain.

The system is organized around one practical question:

Can a reward model trained from compiler/runtime feedback guide a large model toward more reliable parallel code translations?

This method is built for experiments involving serial C/C++, OpenMP, CUDA, and cross-API translation directions such as CUDA -> OpenMP or serial -> CUDA.

🧭 Pipeline at a Glance

flowchart LR
    A[Build / score code data] --> B[Train PRM]
    B --> C[PRM Branch Selection Test]
    C --> D[Run PRM-guided inference]
    D --> E[Validate generated translations]
    E --> F[validators_stats_*.txt]

Most users follow one of these paths:

Goal	Start here	What you run
Reproduce results from a released checkpoint	Inference	`coconut_large_model_inference/run_inference_w_prm_modal_code_parallel.sh`
Train your own PRM	PRM training	`PRM/run_pqm_qwen_code.sh` or `PRM/run_pqm_qwen_code_for_valid.sh`
Rebuild the training data	Dataset rebuild	`dataset/run_build_hecbench_dataset_modal.sh`

🏗️ Repository Structure

Parallax/
├── PRM/                          # PRM training, validation, vectors, checkpoints
│   ├── train_main.py              # Main PRM train/validate entry point
│   ├── run_pqm_qwen_code.sh       # Train on full code training data
│   ├── run_pqm_qwen_code_for_valid.sh
│   │                              # Train on filtered data without validation leakage
│   ├── run_pqm_qwen_validate_code.sh
│   │                              # Run the paper's branch-selection PRM validation test
│   ├── loss_graph.py              # Plot training loss
│   └── plot_distributions.py      # Plot PRM score distributions
├── coconut_large_model_inference/ # Large-model inference with optional PRM guidance
│   ├── run_inference_w_prm_modal_code_parallel.sh
│   │                              # Modal inference over translation directions
│   ├── inference_modal_with_code_option_with_retry.py
│   ├── coconut_w_prm.py
│   └── test_code_a2.jsonl         # Default code inference set
└── dataset/                      # HeCBench data building, validation, and scoring
    ├── full_eval_validation_modal.py
    ├── run_reeval.sh
    ├── run_build_hecbench_dataset_modal.sh
    └── run_make_clean_run_parallel.sh

✨ Key Features

🧠 PRM Training

Train a reward model over code-translation trajectories and validator scores.
Use latent vector caches instead of re-tokenizing or recomputing every sample.
Support full training data and a filtered split that removes validation examples.
Run two-phase training with early backbone freezing and later unfreezing.

🔎 PRM Validation using Branch selection test

Use PRM/run_pqm_qwen_validate_code.sh for the branch-selection test reported in the paper.
Validate checkpoints against PRM/branch_selection_test.jsonl.
Control tolerance-band metrics through the validation wrapper.
Keep validation vectors separate in PRM/vectors_code_validate_new/.

🚀 PRM-Guided Inference

Generate multiple candidates per latent/code step.
Score candidates with the PRM during inference.
Run multiple translation directions in parallel on Modal GPUs.
Produce inference_result_*.jsonl files for downstream HeCBench validation.

✅ HeCBench Evaluation

Clean generated code, build inference training rows, and run validators.
Produce per-run validator logs and validators_stats_*.txt summaries.
Support manual make/clean checks for borderline or missing compiled files.

🚀 Quick Start: Reproduce with a PRM Checkpoint

Use this path when you already have a trained PRM checkpoint.

1. Place the checkpoint

Checkpoints are not bundled with the repository. Put the checkpoint under PRM/checkpoints/ and point scripts at the concrete checkpoint-* directory:

PRM/checkpoints/<run-name>/checkpoint-568/

2. Run PRM-guided inference

cd coconut_large_model_inference
bash run_inference_w_prm_modal_code_parallel.sh

Before running, update the important variables in coconut_large_model_inference/run_inference_w_prm_modal_code_parallel.sh:

Variable	What to set
`PRM_CHECKPOINT`	Path to your PRM checkpoint, usually `../PRM/checkpoints/<run>/checkpoint-*`.
`PRM_MODEL_ID`	Backbone model used to train the PRM. Current runs use `Qwen/Qwen2.5-Coder-7B-Instruct`.
`N_CANDIDATES`	Number of candidates scored by the PRM. Higher values are slower but give more choices.
`OUTPUT_PATH`	Name of the inference JSONL to create. Use a descriptive suffix.
`DIRECTIONS`	Translation directions, for example `cuda:omp`, `omp:cuda`, `serial:omp`, `serial:cuda`.
`MODAL_GPU_SPEC`	Modal GPU type/count, for example `H200:2` or `A100-80GB:4`.

The default inference set is coconut_large_model_inference/test_code_a2.jsonl.

3. Copy inference results for evaluation

Copy the generated inference_result_*.jsonl file into:

dataset/data/Datasets/HeCBench/

4. Run HeCBench validation

cd dataset
python full_eval_validation_modal.py

Important: dataset/full_eval_validation_modal.py currently selects the input file through the hard-coded suffix near the top of the file. Set it so this path exists:

data/Datasets/HeCBench/inference_result_modal<suffix>.jsonl

Validation produces:

dataset/data/Datasets/HeCBench/inference_output<suffix>.jsonl
dataset/logs/validators_results_<run_id>.txt
dataset/logs/validators_stats_<run_id>.txt

The final reported numbers are based on the generated validators_stats_*.txt files.

5. Manually re-check uncertain failures

Some translations that appear as failures in validators_stats_*.txt may still be worth checking manually, especially when the validator log shows missing or borderline compiled outputs. For those cases, take the relevant generated run directories from the validator statistics/logs and add them to the LOCATIONS array in dataset/run_make_clean_run_parallel.sh.

Then rerun the generated code directly:

cd dataset
bash run_make_clean_run_parallel.sh

The script runs make clean run for each listed generated directory and writes the combined output to dataset/logs/make_clean_<suffix>.txt. Inspect that log by hand and update your final accounting for cases that compile and run successfully despite being missed or marked incorrectly by the automated stats.

🧪 Train Your Own PRM

All PRM training and validation lives in PRM/. This section is based on the Process Q Model. We use it only as the base of our code not as part of our method.

Option A: Train on the full training set

cd PRM
bash run_pqm_qwen_code.sh

This uses PRM/train_code_a3.jsonl.

Use this when you want the strongest model from all available training data. Do not use this checkpoint for a clean validation report if validation examples are included in the full training set.

Option B: Train without validation leakage

cd PRM
bash run_pqm_qwen_code_for_valid.sh

This uses PRM/train_code_a3_filtered.jsonl, which removes validation examples from training. This is the right path when you want validation on PRM/branch_selection_test.jsonl to be meaningful.

Training parameters you will probably change

The shell wrappers call PRM/train_main.py. Edit the wrapper first; only edit train_main.py when you are changing the training code itself.

Parameter	Where	What it controls
`MODEL_PATH` / `--model-path`	wrapper + `train_main.py`	Hugging Face model ID or local model directory for the PRM backbone.
`CUDA_VISIBLE_DEVICES`	wrapper	GPU IDs allocated to the run. `NPROC_PER_NODE` is computed from this list.
`--train-jsonl`	wrapper	Training JSONL. Use `train_code_a3.jsonl` for full training or `train_code_a3_filtered.jsonl` for validation-clean training.
`--vector-base-dir`	wrapper	Directory used to resolve vector files referenced in the JSONL. Code PRM runs usually use `PRM/vectors_code`.
`--save-path`	wrapper	Output directory for checkpoints and profiler outputs. Change it for each important experiment.
`--checkpoint-path`	wrapper	Existing checkpoint to resume/load. Remove for a fresh run, or point to a concrete `checkpoint-*` directory.
`--loss-type`	wrapper	Objective: `mse`, `huber`, `rank`, `orm`, or `bce`. Current code PRM runs use `mse`.
`--zeta`	wrapper	Objective-specific shaping/scaling parameter used by the current loss.
`--two-phase`	wrapper	Freezes part of the model early, then unfreezes later. Useful for vector-input PRM training.
`--unfreeze-step`	wrapper	Step where the backbone is unfrozen during two-phase training.
`--backbone-lr-factor`	wrapper	Multiplier applied to the backbone learning rate after unfreezing.
`--num-epochs`	wrapper	Number of training epochs.
`--effective-batch-size`	wrapper	Total effective batch size across all GPUs after gradient accumulation.
`--per-device-batch-size`	wrapper	Samples per GPU per forward pass. Increase only if VRAM allows it.
`--score-threshold`	wrapper	Score below this value is treated as a negative step. Current scripts use `0.5`.

Good defaults to review before every run:

MODEL_PATH="Qwen/Qwen2.5-Coder-7B-Instruct"
export CUDA_VISIBLE_DEVICES=0,1,2
--train-jsonl "${SCRIPT_DIR}/train_code_a3_filtered.jsonl"
--vector-base-dir "${SCRIPT_DIR}/vectors_code"
--save-path "${SCRIPT_DIR}/checkpoints/<new-run-name>"

✅ Run the Paper's Branch-Selection Test

cd PRM
bash run_pqm_qwen_validate_code.sh

This wrapper is the PRM validation / branch-selection test used in the paper. With the checked-in defaults it evaluates a checkpoint on the released validation JSONL and vector cache. If you rebuild the branch-selection dataset from scratch in the section below, point this same wrapper at that rebuilt JSONL and vectors directory to reproduce the paper numbers end to end.

Update these values in PRM/run_pqm_qwen_validate_code.sh:

Variable / argument	What it should point to
`VAL_JSONL`	JSONL scored by the branch-selection wrapper. The checked-in default is `PRM/branch_selection_test.jsonl`.
`CHECKPOINT_PATH`	Checkpoint to evaluate, usually from the filtered training run.
`--vector-base-dir`	Vector cache for the same branch-selection dataset. The checked-in default is `PRM/vectors_code_validate_new`.
`TOLERANCE` / `--tolerance`	Tolerance band for the validation metric.
`CUDA_VISIBLE_DEVICES`	GPUs used for validation.

The validation wrapper calls PRM/train_main.py with --validate, so it evaluates a checkpoint rather than continuing training. This is the same entry point used for the branch-selection numbers discussed in the dataset rebuild section below.

📈 Monitor Training

Two utility scripts help inspect a run:

Script	Purpose
PRM/loss_graph.py	Plot the training loss curve.
PRM/plot_distributions.py	Plot PRM score distributions.

Use these to check whether training is stable and whether the PRM scores separate successful and unsuccessful candidates in a useful way.

🛠️ Rebuild or Re-score the Dataset

Most users can skip this section. The released training JSONL files are enough for PRM training and validation.

Re-score existing translated code

cd dataset
bash run_reeval.sh

Before launching, check dataset/run_reeval.sh:

Parameter	Meaning
`GPUS`	Local GPU IDs available for validation.
`TOTAL`	Number of entries to process.
`MAX_PARALLEL`	Number of validator jobs to run per GPU.

Compressed assets may need to be decompressed before use:

dataset/data/Datasets/HeCBench/translations_redo_a2_full.jsonl.gz

Regenerate translations for training data

cd dataset
bash run_build_hecbench_dataset_modal.sh

This is the slowest part of the pipeline. It runs generation on Modal and local HeCBench validation on your machine or cluster.

Parameters likely to change in dataset/run_build_hecbench_dataset_modal.sh:

Parameter	Meaning
`COCONUT_CONFIG`	Path to the Coconut code-translation config. After the folder rename, point into `coconut_large_model_inference/`.
`CUDA_VISIBLE_DEVICES`	Local GPU used for validation.
`FROM_API` / `TO_API`	Translation direction for dataset generation.
`MODAL_GPU`	Modal GPU spec used for generation.
`--split`	Dataset split passed to `scripts/run_build_dataset_hecbench.py`.
`--num-resamples`	Number of generated alternatives per item.
`--max-workers`	Local parallelism for build/validation.

Rebuild the branch-selection (PRM validation) data

The branch-selection experiment measures whether the PRM can pick the best continuation when the large model branches off at every latent step. To reproduce that dataset, run the 4-direction wrapper:

cd dataset
bash run_build_validation_dir_4runs_modal.sh
# resume an interrupted run:
bash run_build_validation_dir_4runs_modal.sh --resume

What the script does:

Runs Coconut on Modal for 4 translation directions in parallel (cuda→omp, omp→cuda, serial→cuda, serial→omp), LIMIT kernels each.
For every kernel it produces the original translation and one continuation per (latent_vector_index, resample) branch-off.
With SAVE_CONTINUATION_VECTORS=1 (default), the full latent trajectory of each continuation is dumped as vectors/{capture_id}_v{i}_r{j}.pt alongside the original vectors/{capture_id}.pt. These are the inputs the PRM scores in the branch-selection test.
Local HeCBench validation is skipped (--skip-validation); scoring those branches is done later by the PRM, not the compiler.

Output layout under dataset/data/validation_dir/:

data/validation_dir/
├── dataset.jsonl                 # one row per kernel/direction
├── vectors/
│   ├── <capture_id>.pt           # original translation latent vectors  [K, dim]
│   └── <capture_id>_v{i}_r{j}.pt # continuation latent vectors          [K, dim]
└── logs/validation_dir_<ts>/build_<from>_<to>.log

Variables you will probably edit in dataset/run_build_validation_dir_4runs_modal.sh:

Variable	Meaning
`COCONUT_CONFIG`	Coconut config. Default `../coconut_large_model_inference/args/code_translation_70b.yaml`. Pass `7b.yaml`/`14b.yaml`/`30b.yaml`/`70b.yaml` and `MEM_PER_GPU` is auto-derived.
`MODAL_GPU`	Modal GPU spec, e.g. `A100-80GB:4` (default), `H200:2`, `H200:3`. Per-GPU memory cap is computed from this.
`SAVE_CONTINUATION_VECTORS`	`1` (default) saves continuation `.pt` files; `0` skips them (you almost never want this for branch-selection data).
`LIMIT`	Kernels per direction. Default `15` (matches the released branch-selection set).
`OUTPUT_DIR`	Output root, default `data/validation_dir`.
`MAX_ATTEMPTS` / `RETRY_DELAY_SECS`	Retry policy for Modal OOM (`RETRY_EXIT_CODE=33`).

Override at the command line, for example:

COCONUT_CONFIG=../coconut_large_model_inference/args/code_translation_30b.yaml \
MODAL_GPU=H200:2 \
SAVE_CONTINUATION_VECTORS=1 \
bash run_build_validation_dir_4runs_modal.sh

After the build finishes, point the PRM validation wrapper at the produced vectors directory (set --vector-base-dir in PRM/run_pqm_qwen_validate_code.sh to dataset/data/validation_dir/vectors) and the corresponding JSONL to score every branch and reproduce the branch-selection numbers.

Prerequisites: the coconut conda environment (the script does conda activate LGPRM), a working Modal account (modal token new), and the volume hf-model-cache will be created automatically on first run to cache the model weights between Modal cold starts.

📊 Outputs and Artifacts

Artifact	Produced by	Used for
`PRM/checkpoints/<run>/checkpoint-*`	PRM training	Validation and inference-time PRM scoring.
`PRM/vectors_code/`	Dataset/vector preparation	Training vector lookup.
`PRM/vectors_code_validate_new/`	Validation vector preparation	Validation vector lookup.
`coconut_large_model_inference/inference_result_*.jsonl`	PRM-guided inference	Input to HeCBench validation.
`dataset/data/Datasets/HeCBench/inference_output*.jsonl`	Final validation	Per-sample scored outputs.
`dataset/logs/validators_stats_*.txt`	Final validation	Reported run statistics.

Some runs require one manual post-check: copy the compiled-file list from the validation log into dataset/run_make_clean_run_parallel.sh and compare its output to identify missing files that still qualify.

🧯 Troubleshooting

Symptom	What to check
PRM checkpoint not found	Make sure scripts point to `PRM/checkpoints/<run>/checkpoint-*`, not just the parent run directory.
Validation uses the wrong inference file	Update the hard-coded `suffix` in dataset/full_eval_validation_modal.py.
Inference still points to old folders	Replace old `../My_PRM/...` paths with `../PRM/...` in inference wrappers.
Dataset build config is missing	Update old `coconut_large_model` paths to `coconut_large_model_inference`.
CUDA out of memory	Reduce `--per-device-batch-size`, reduce candidates, or use fewer/larger GPUs depending on the stage.
Validation results look too optimistic	Train with PRM/train_code_a3_filtered.jsonl before validating on PRM/branch_selection_test.jsonl.

🧾 Reproducing Final Numbers

To reproduce an existing run from a released PRM checkpoint:

Place the checkpoint under PRM/checkpoints/.
Update PRM_CHECKPOINT in coconut_large_model_inference/run_inference_w_prm_modal_code_parallel.sh.
Run PRM-guided inference.
Copy the resulting inference_result_*.jsonl into dataset/data/Datasets/HeCBench/.
Update suffix in dataset/full_eval_validation_modal.py.
Run validation and use the resulting validators_stats_*.txt files.
Run the manual make/clean check when the validator logs indicate missing compiled files that may still qualify.

📜 License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
PRM		PRM
assets		assets
coconut_large_model_inference		coconut_large_model_inference
dataset		dataset
.gitignore		.gitignore
LGPRM.yaml		LGPRM.yaml
LICENSE		LICENSE
README.md		README.md
branch_selection_test.jsonl		branch_selection_test.jsonl

Folders and files

Latest commit

History

Repository files navigation

⚡ Latent Reasoning Guidance for Parallel Code Translation

📋 Overview

🧭 Pipeline at a Glance

🏗️ Repository Structure

✨ Key Features

🧠 PRM Training

🔎 PRM Validation using Branch selection test

🚀 PRM-Guided Inference

✅ HeCBench Evaluation

🚀 Quick Start: Reproduce with a PRM Checkpoint

1. Place the checkpoint

2. Run PRM-guided inference

3. Copy inference results for evaluation

4. Run HeCBench validation

5. Manually re-check uncertain failures

🧪 Train Your Own PRM

Option A: Train on the full training set

Option B: Train without validation leakage

Training parameters you will probably change

✅ Run the Paper's Branch-Selection Test

📈 Monitor Training

🛠️ Rebuild or Re-score the Dataset

Re-score existing translated code

Regenerate translations for training data

Rebuild the branch-selection (PRM validation) data

📊 Outputs and Artifacts

🧯 Troubleshooting

🧾 Reproducing Final Numbers

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages