Skip to content

feat: enable QASYMM8_SIGNEDF32 in CpuGemmAssemblyDispatch#1297

Open
alvoron wants to merge 1 commit into
ARM-software:mainfrom
alvoron:alvoron_qasymm8_signed_f32_dispatch
Open

feat: enable QASYMM8_SIGNEDF32 in CpuGemmAssemblyDispatch#1297
alvoron wants to merge 1 commit into
ARM-software:mainfrom
alvoron:alvoron_qasymm8_signed_f32_dispatch

Conversation

@alvoron

@alvoron alvoron commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Two gaps in CpuGemmAssemblyDispatch blocked QASYMM8_SIGNED input from validating with an F32 output tensor, even though the underlying arm_gemm kernel GemmInterleaved<int8_t, int8_t, float, DequantizeFloat> already supports this combination on AArch64.

Gap 1 - has_opt_impl: The QASYMM8_SIGNED branch only tested for S32 and S8 outputs. Passing F32 fell through to the S8S8 Requantize32 check, which fails because has_opt_gemm<int8_t, int8_t, int8_t, Requantize32> and has_opt_gemm<int8_t, int8_t, float, DequantizeFloat> are different instantiations.

Gap 2 - validate: There was no output-type guard for QASYMM8_SIGNED input at all. The equivalent guard for QASYMM8 explicitly allowed QASYMM8/S32/F32; QASYMM8_SIGNED had no such allowance, so F32 output reached downstream checks with no clear error.

Two gaps in the assembly dispatch layer prevented QASYMM8_SIGNED input
from producing F32 output:

1. has_opt_impl() had no branch for F32 output when input is S8/
   QASYMM8_SIGNED, causing spurious kernel-not-found errors.  Add a
   DequantizeFloat branch mirroring the existing S32 branch.

2. validate() rejected F32 output for QASYMM8_SIGNED input because it
   had no explicit allowance for that combination.  Add a guard that
   permits QASYMM8_SIGNED/S32/F32 as output types (matching the already-
   existing QASYMM8 guard).

3. AsmGemmInfo gains dequant_a_offset / dequant_b_offset fields so that
   callers can supply quantization zero-points to create_arm_gemm_dequant
   without touching existing callers.

Also fix the __aarch64_ typo in the DequantFP32_SupportedTypes test guard
so that the test now actually executes on AArch64 targets.

Signed-off-by: Aleksandr Voron <aleksandr.voron@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant