Skip to content

Add YAML-aware from_pretrained scaling + runtime transform wiring#1072

Open
anujgupt-github wants to merge 2 commits into
debug/whole-model-memory-pr1047from
debug/agentic-scale-on-pr1052
Open

Add YAML-aware from_pretrained scaling + runtime transform wiring#1072
anujgupt-github wants to merge 2 commits into
debug/whole-model-memory-pr1047from
debug/agentic-scale-on-pr1052

Conversation

@anujgupt-github

Copy link
Copy Markdown
Contributor

Summary

  • add qeff_layer_scale_yaml handling in the safetensor materialization/load path so from_pretrained(...) can apply YAML tensor scaling for both standard and streaming checkpoint loads
  • inject layer-scale metadata onto loaded configs so runtime transforms pick up scaling/descaling placement without requiring pre-scaled snapshots
  • add helper APIs for loaded-wrapper precision-recovery workflows (resolve_model_card_from_loaded_qeff_model, run_precision_recovery_agent_from_loaded_qeff_model)
  • add regression coverage for:
    • YAML scaling on non-streaming and streaming from_pretrained
    • runtime transform wiring for Qwen3.5-MoE from YAML-loaded models
    • loaded-wrapper model-card resolution for precision-recovery agent
  • keep Qwen3-VL-MoE tensor orientation compatibility fix required by quickcheck parity

Validation

  • /home/anujgupt/qeff_env2/bin/python -m ruff format --check QEfficient/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py QEfficient/utils/__init__.py QEfficient/utils/layer_scale_checkpoint.py QEfficient/utils/precision_recovery_agent.py QEfficient/utils/safetensor_materializer.py tests/base/test_safetensor_materializer.py tests/unit_test/models/test_model_quickcheck.py tests/utils/test_precision_recovery_agent.py
  • /home/anujgupt/qeff_env2/bin/python -m ruff check QEfficient/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py QEfficient/utils/__init__.py QEfficient/utils/layer_scale_checkpoint.py QEfficient/utils/precision_recovery_agent.py QEfficient/utils/safetensor_materializer.py tests/base/test_safetensor_materializer.py tests/unit_test/models/test_model_quickcheck.py tests/utils/test_precision_recovery_agent.py
  • /home/anujgupt/qeff_env2/bin/python -m pytest tests/base/test_safetensor_materializer.py -q
  • /home/anujgupt/qeff_env2/bin/python -m pytest tests/utils/test_precision_recovery_agent.py -q
  • HF_HUB_CACHE=/home/anujgupt/.cache/hf_hub /home/anujgupt/qeff_env2/bin/python -m pytest tests/unit_test/models/test_model_quickcheck.py -q

Notes

  • Full quickcheck is green with a user-writable HF cache (176 passed, 3 skipped).
  • This PR is based on PR1052 branch (debug/whole-model-memory-pr1047) per dependency on materialization infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants