Add YAML-aware from_pretrained scaling + runtime transform wiring by anujgupt-github · Pull Request #1072 · quic/efficient-transformers

anujgupt-github · 2026-06-11T21:26:23Z

Summary

add qeff_layer_scale_yaml handling in the safetensor materialization/load path so from_pretrained(...) can apply YAML tensor scaling for both standard and streaming checkpoint loads
inject layer-scale metadata onto loaded configs so runtime transforms pick up scaling/descaling placement without requiring pre-scaled snapshots
add helper APIs for loaded-wrapper precision-recovery workflows (resolve_model_card_from_loaded_qeff_model, run_precision_recovery_agent_from_loaded_qeff_model)
add regression coverage for:
- YAML scaling on non-streaming and streaming from_pretrained
- runtime transform wiring for Qwen3.5-MoE from YAML-loaded models
- loaded-wrapper model-card resolution for precision-recovery agent
keep Qwen3-VL-MoE tensor orientation compatibility fix required by quickcheck parity

/home/anujgupt/qeff_env2/bin/python -m ruff format --check QEfficient/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py QEfficient/utils/__init__.py QEfficient/utils/layer_scale_checkpoint.py QEfficient/utils/precision_recovery_agent.py QEfficient/utils/safetensor_materializer.py tests/base/test_safetensor_materializer.py tests/unit_test/models/test_model_quickcheck.py tests/utils/test_precision_recovery_agent.py
/home/anujgupt/qeff_env2/bin/python -m ruff check QEfficient/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py QEfficient/utils/__init__.py QEfficient/utils/layer_scale_checkpoint.py QEfficient/utils/precision_recovery_agent.py QEfficient/utils/safetensor_materializer.py tests/base/test_safetensor_materializer.py tests/unit_test/models/test_model_quickcheck.py tests/utils/test_precision_recovery_agent.py
/home/anujgupt/qeff_env2/bin/python -m pytest tests/base/test_safetensor_materializer.py -q
/home/anujgupt/qeff_env2/bin/python -m pytest tests/utils/test_precision_recovery_agent.py -q
HF_HUB_CACHE=/home/anujgupt/.cache/hf_hub /home/anujgupt/qeff_env2/bin/python -m pytest tests/unit_test/models/test_model_quickcheck.py -q

Full quickcheck is green with a user-writable HF cache (176 passed, 3 skipped).
This PR is based on PR1052 branch (debug/whole-model-memory-pr1047) per dependency on materialization infrastructure.

Signed-off-by: Anuj Gupta <anujgupt@users.noreply.github.com>

anujgupt added 2 commits June 12, 2026 01:15

Add Qwen3.5-MoE scale transform wiring and agentic precision infra

9912aeb

Add from_pretrained YAML scaling load path and runtime wiring tests

b5e3506

Signed-off-by: Anuj Gupta <anujgupt@users.noreply.github.com>