NVIDIA / TensorRT-LLM Public

Notifications You must be signed in to change notification settings
Fork 2.5k
Star 13.9k

Code
Issues 594
Pull requests 829
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: NVIDIA/TensorRT-LLM

Labels 66 Milestones 1

New pull request New

829 Open 10,621 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[https://nvbugs/6341070][fix] Pass kv_cache_free_gpu_memory_fraction=0.5 to TRTLLMWorker.init_with_new_llm…

#15495 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6329165][fix] In TestNemotronNanoV3.test_accuracy, mocker.patch.dict GSM8K.EVALUATE_KWARGS…

#15494 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[feat] gRPC: implement SubscribeKvEvents for KV-cache event streaming

#15493 opened Jun 19, 2026 by key4ng • Draft

[https://nvbugs/6341072][fix] Change the model_name fixture to "llama-models-v2/TinyLlama-1.1B-Chat-v1.0"…

#15492 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6337233][fix] In _build_fake_self, bind PyExecutor._is_stats_dummy_request onto the fake…

#15491 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6337238][fix] Test-only fix — use the existing get_hf_rope_theta helper, adapt the…

#15490 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6337231][fix] Replace all 9 self._is_stats_dummy_request(req) calls with…

#15489 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6335726][fix] In test_qwen_moe_routed_expert_multi_lora_varying_ranks, drop ranks…

#15488 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6337226][fix] Switch max(positive_hits) → min(positive_hits) in both the range and…

#15487 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6317600][fix] Add an early return at the head of _run_attention_warmup when…

#15486 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[None][feat] support per-layer mixed-precision MoE serving (GLM, Qwen3-MoE)

#15485 opened Jun 19, 2026 by joshua-hill

Loading…

[https://nvbugs/6337224][fix] Update PERF_SANITY_DIR to include aggregated/; in recipe_to_server_config…

#15484 opened Jun 19, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6322076][fix] Added _init_nccl_x86_64_init_workaround() that sets NCCL_MNNVL_ENABLE=0 and…

#15483 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6336801][fix] Add the two skip_softmax_threshold_scale_factor_decode/prefill aliases to…

#15482 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6239637][fix] Unwaive Qwen3.5 cases on A100 platform

#15481 opened Jun 18, 2026 by nv-guomingz Collaborator

Loading…

1 task done

[https://nvbugs/6322073][fix] Add _needs_x86_nccl_pp_workaround() and init_nccl_pp_workaround() helpers…

#15480 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6322045][fix] In triton_context, when max_q_len == max_kv_len (so cache_lens=0) and…

#15479 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6317074][fix] One-line bump of free_gpu_memory_fraction from 0.6 to 0.8 in…

#15477 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[#14679][fix] Fix fused-QKV TP sharding for Phi-3/Phi-4

#15475 opened Jun 18, 2026 by guan404ming Contributor

Loading…

1 task done

[https://nvbugs/6276981][fix] Force the q-split + allgather code path whenever q_split_eligible=True (drop…

#15474 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[https://nvbugs/6315845][fix] Dual-location revert of #14851's protective-code removal: (1) pin…

#15472 opened Jun 18, 2026 by chenfeiz0326 Collaborator

Loading…

2 tasks done

[https://nvbugs/6337235][test] Fix MX/GMS model loader fixtures

#15471 opened Jun 18, 2026 by chienchunhung Collaborator • Draft

[None][feat] Qwen-Image: load pre-quantized ModelOpt NVFP4/FP8 checkpoints

#15470 opened Jun 18, 2026 by jingyu-ml

Loading…

[https://nvbugs/6327147][fix] Lower kv_cache_config.free_gpu_memory_fraction from 0.7 to 0.5 in the…

#15468 opened Jun 18, 2026 by tensorrt-cicd Collaborator

Loading…

2 tasks done

[#15463][fix] Sync llm_args.enable_chunked_prefill with chunked-prefill fallback gates

#15467 opened Jun 18, 2026 by DhineshPonnarasan Contributor

Loading…

8 tasks done

Previous 1 2 3 4 5 … 33 34 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!