LLM-NTT Examples

This repository packages extracted Number Theoretic Transform RTL variants from YATA and HOGE with small Verilator tests that compare against the current TFHEpp C++ reference headers.

variants/yata-raintt: YATA compressed 27-bit RAINTT NTT and INTT.
variants/hoge: merged HOGE Chisel sources for the streaming INTT/NTT wrappers, ExternalProduct forward-NTT oracle, and full-vector NTT/INTT identity pipeline.
third_party/TFHEpp: TFHEpp submodule used as the C++ reference.
docs/ntt-module-specs.md: top-level module specifications for generating replacement Verilog that passes the included tests.
tasks/: machine-readable benchmark task manifests for architecture search.
docs/architecture-search-space.md and docs/scoring.md: search knobs and evaluation rules.
examples/autontt/: AutoNTT-oriented mapping notes and custom reduction examples, including an LLM-based RTL candidate generator.

The copied YATA and HOGE RTL is AGPL-3.0 licensed. See NOTICE.md and licenses/.

Native Run

Install python3, sbt, cmake, ninja, clang++, and verilator, then run:

git submodule update --init --recursive
scripts/run_all.sh

The script generates Verilog with sbt run, configures CMake with Clang, builds the Verilator harnesses, and runs CTest.

Fresh-Clone HLS Reproduction

From a fresh clone, initialize submodules, build or reuse the SIF, generate HLS RTL with Vitis HLS, run the HLS functional checks, verify the emitted RTL directories, and write an AutoNTT-metric report with:

git clone --recurse-submodules https://github.com/virtualsecureplatform/LLM-NTT-Examples.git
cd LLM-NTT-Examples
scripts/reproduce_hls_autontt_metrics.py

The wrapper uses LLM_NTT_SIF when set, then an existing llm-ntt-rootless.sif, then llm-ntt.sif; if no image exists, it builds llm-ntt.sif from apptainer/llm-ntt.def. Vitis remains host-side, so /home/opt/xilinx/Vitis/2023.2/settings64.sh must be visible by default. Override these paths as needed:

scripts/reproduce_hls_autontt_metrics.py \
  --sif auto \
  --xilinx-root /home/opt/xilinx \
  --vitis-settings Vitis/2023.2/settings64.sh

If the host cannot use unprivileged Apptainer builds, add --sudo-sif-build. To reuse an already-built image during development, add --skip-sif-build. Results are written under build/reproduce-hls-autontt/<timestamp>/summary.json and report.md.

Apptainer Run

Build the container:

scripts/build_llm_ntt_sif.sh

Run the same build and test flow inside the container:

apptainer run --no-home --pwd /work --bind "$(pwd):/work" llm-ntt.sif

The %runscript expects the repository to be mounted at /work. The single-threaded squashfs argument avoids mksquashfs orderer failures observed on some unprivileged Apptainer hosts.

The image also carries the non-Xilinx dependencies used by the AutoNTT HLS path: libgflags-dev, libgoogle-glog-dev, OpenCL headers/libraries, and the Python TAPA frontend. It also installs the native non-Vitis headers/libraries commonly needed by TAPA/Pasta runtime builds: Boost coroutine/context/thread/stacktrace, nlohmann-json, tinyxml2, and yaml-cpp. Vitis remains a host-side licensed tool, and the base PyPI tapa package does not provide the full TAPA/Pasta C++ runtime (tapa.h, libtapa, libfrt) needed by AutoNTT's generated C-simulation link line.

For the rootless image with the full RapidStream TAPA runtime installed, build with:

scripts/build_llm_ntt_sif.sh \
  --with-tapa-runtime \
  --tapa-build-jobs 4 \
  --output llm-ntt-rootless.sif

If your Apptainer installation cannot use fakeroot, build with:

scripts/build_llm_ntt_sif.sh --sudo

With --sudo, the wrapper stages the SIF under SIF_TMPDIR, TMPDIR, or /tmp, changes ownership back to the caller, then moves it to --output inside the repository. Use --sudo-temp-dir DIR if /tmp is not suitable. Use --bind-xilinx when a build-time %post step needs the host Xilinx tree; this bind is read-only and does not copy Vitis into the image. To build and install the full RapidStream TAPA runtime inside a sudo-built image, bind the host Xilinx tree and enable the opt-in runtime build:

scripts/build_llm_ntt_sif.sh --sudo --with-tapa-runtime --tapa-build-jobs 2

This downloads Bazelisk, uses Bazel 8.4.2, clones rapidstream-tapa, patches its VARS.bzl to use /home/opt/xilinx version 2023.2, builds //:tapa-pkg-tar, and installs tapacc, tapa.h, libtapa, and libfrt under /opt/rapidstream-tapa in the SIF. It can take a long time and needs the build-time Vitis bind. Use --tapa-bazel-version VERSION if the TAPA branch requires a different Bazel release.

For an image-only sanity check after rebuilding:

apptainer exec --no-home --pwd /work --bind "$(pwd):/work" llm-ntt.sif \
  scripts/check_autontt_hls_deps.sh --image-only

For full AutoNTT HLS C-simulation/synthesis checks with the runtime-enabled SIF, bind the host Xilinx tree, then run the default checker:

apptainer exec --no-home --pwd /work \
  --bind "$(pwd):/work" \
  --bind /home/opt/xilinx:/home/opt/xilinx \
  llm-ntt-rootless.sif \
  scripts/check_autontt_hls_deps.sh

After the SIF and host Vitis tree are visible, run the generated HLS compile/synthesis comparison harness. The default platform is the installed U200 platform xilinx_u200_gen3x16_xdma_2_202110_1; pass --platform to target another installed platform:

scripts/run_autontt_hls_sif_compare.sh --sif llm-ntt-rootless.sif

It copies the latest generated HOGE custom AutoNTT HLS artifact, runs the full dependency check inside the SIF, runs make csim_compile, runs RapidStream tapa compile, and writes a timestamped summary.json and report.md under build/autontt-hls-sif-compare/.

Manual Steps

Generate Verilog only:

scripts/gen_verilog.sh

Build and test after Verilog generation:

cmake -S . -B build -G Ninja -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang
cmake --build build
ctest --test-dir build --output-on-failure

Evaluate a single benchmark task, using the extracted RTL as the baseline:

scripts/evaluate_candidate.sh --task hoge_streaming_intt_1024_p64

Evaluate candidate Verilog in a directory:

scripts/evaluate_candidate.sh \
  --task hoge_streaming_intt_1024_p64 \
  --verilog-dir candidate/hoge-intt

Add an optional flattened Yosys structural estimate:

scripts/evaluate_candidate.sh \
  --task hoge_externalproduct_ntt_1024_p64 \
  --with-yosys

Generate an AutoNTT-style LLM RTL candidate using an OpenAI-compatible endpoint:

scripts/autontt_llm_generate.py \
  --task hoge_nttid_1024_identity \
  --endpoint lab \
  --strategy behavioral_reference \
  --attempts 1

The generator writes prompts, responses, candidate Verilog, and evaluator results under build/llm-runs/. Use --plan-only to inspect the AutoNTT-style search points without calling the LLM, or --dry-run to write the prompt only. --endpoint lab reads the private endpoint from LLM_NTT_LAB_ENDPOINT; the endpoint can also be supplied directly with LLM_NTT_LLM_ENDPOINT. In this workspace, --endpoint kunashiri resolves to the llama.cpp OpenAI-compatible server at http://kunashiri:8080/v1; pass --disable-thinking for Qwen-style models that otherwise return reasoning_content before the requested JSON. The hoge_nttid_1024_identity command is only a plumbing smoke test because an identity implementation can satisfy its observable contract. For real functional NTT/INTT generation, use a task such as hoge_streaming_intt_1024_p64; the generator rejects trivial pass-through shortcuts for non-identity arithmetic tasks unless --allow-shortcuts is explicitly supplied.

To create a known-good AutoNTT-style run artifact from the extracted RTL, use the reference candidate source:

scripts/autontt_llm_generate.py \
  --task hoge_streaming_intt_1024_p64 \
  --candidate-source reference \
  --strategy hardware \
  --arch-type I \
  --modmul-type C

This does not count as LLM-generated arithmetic RTL. It copies the task's golden Verilog into the same run/evaluation layout so functional baselines and future LLM candidates can be compared with the same prepared tests.

For generated non-reference RTL baselines, use the built-in behavioral generators:

scripts/autontt_llm_generate.py \
  --task hoge_streaming_intt_1024_p64 \
  --candidate-source behavioral \
  --strategy hardware \
  --arch-type I \
  --modmul-type C

The HOGE INTT path emits a compact INTTWrap.v that implements the same cuHEpp::TwistINTT<uint32_t,10> observable contract and runs through the same prepared evaluator. Treat these outputs as functional behavioral RTL, not as optimized AutoNTT/Vitis-quality architectures.

To keep the endpoint in the generation loop without requiring it to emit a large arithmetic Verilog file verbatim, use the endpoint-guided behavioral source:

scripts/autontt_llm_generate.py \
  --task hoge_streaming_intt_1024_p64 \
  --endpoint kunashiri \
  --disable-thinking \
  --candidate-source llm_behavioral \
  --strategy hardware \
  --arch-type I \
  --modmul-type C

This mode asks the endpoint for a small JSON selection of a supported functional generator, validates that selection, emits the corresponding RTL locally, and evaluates it through the same prepared tests. The private endpoint is still supplied only through LLM_NTT_LAB_ENDPOINT.

For the local kunashiri llama.cpp server, the full endpoint-backed functional harness can be run with one command:

scripts/run_autontt_kunashiri_harness.sh

By default this writes artifacts to build/autontt-kunashiri-harness/, asks kunashiri to select each bounded RTL generator, emits the selected candidate RTL locally, and runs the prepared LLM-NTT tests through Apptainer. The aggregate pass/fail record is written to build/autontt-kunashiri-harness/summary.json. Add --task <task-id> to run a single task or --with-vitis --vitis-timeout SEC for optional Vivado/Vitis synthesis.

To try the adjacent AutoNTT HLS backend directly, use the HLS harness:

scripts/run_autontt_hls_harness.py --modmul-type B

This runs ../AutoNTT/automation_framework/AutoNTT.py, captures generated TAPA/Vitis HLS artifacts under build/autontt-hls-runs/<timestamp>/, and writes summary.json. The Barrett path is a positive code-generation control. To generate HOGE p64 custom-reduction HLS source, run:

scripts/run_autontt_hls_harness.py --modmul-type C

For custom reductions the harness defaults to --custom-bu-mode estimate, which supplies explicit estimated butterfly-unit attributes pipeline_depth,dsp,lut,ff = 15,32,2345,1481 to unblock AutoNTT source generation. The generated HLS source is useful for adapter work, but those BU attributes are not measured synthesis results. To run AutoNTT's original C-sim/TAPA/Autobridge custom-BU measurement probe, use:

scripts/run_autontt_hls_harness.py \
  --modmul-type C \
  --custom-bu-mode probe \
  --allow-failure

On this host the measured probe reaches AutoNTT's temp_design and is blocked until the generated C-simulation link line can see tapa.h, libtapa, libfrt, glog, gflags, OpenCL, and Vitis HLS headers. Use scripts/check_autontt_hls_deps.sh to distinguish the rebuilt image's non-Xilinx dependencies from the full TAPA/Pasta/Vitis runtime needed for synthesis. AutoNTT HLS artifacts are not yet Verilog candidates for the prepared tests; passing this repository's task manifests from that path still requires HLS-to-RTL synthesis plus an interface/order adapter for task tops such as INTTWrap and ExternalProductWrap.

YATA is not a direct AutoNTT backend input because the extracted task is N = 512. To generate an LLM-style YATA HLS candidate, test it against TFHEpp, synthesize it with Vitis HLS, and compare its HLS estimates against the extracted RTL reference using the AutoNTT metric script, run:

scripts/run_yata_hls_synth_compare.py --sif auto

For smaller HLS bring-up targets, generate and compare the 32-point HOGE path, the single-block 8-point YATA path, and the 8-lane by 8-cycle YATA path:

scripts/run_small_variant_hls_synth_compare.py --variants all --sif auto

The small-variant driver writes reference and generated HLS tops, checks both against TFHEpp-derived references, synthesizes INTT/NTT/combined tops with Vitis HLS, and compares the resulting results.json files with the same AutoNTT metric script. A verified U280 run produced:

Variant	INTT cycles	NTT cycles	LUT	FF	DSP	BRAM	fmax MHz
`hoge32`	4294	4292	69693	31644	40	12	342.466
`yata8`	81	96	17303	11938	70	8	342.466
`yata8x8`	5250	9168	65492	38962	156	8	305.157

Behavioral generation currently supports:

hoge_streaming_intt_1024_p64: correctness-scored HOGE INTT arithmetic.
hoge_nttid_1024_identity: correctness-scored identity smoke path.
hoge_streaming_ntt_1024_p64: standalone NTT wrapper interface/lint gate.
hoge_externalproduct_ntt_1024_p64: correctness-scored HOGE ExternalProduct forward-NTT arithmetic.
yata_raintt_512_p27: correctness-scored YATA RAINTT INTT/NTT arithmetic.

Run every built-in behavioral candidate through the prepared evaluator:

scripts/evaluate_behavioral_candidates.sh

Run every endpoint-guided behavioral candidate through the prepared evaluator:

scripts/evaluate_behavioral_candidates.sh \
  --candidate-source llm_behavioral \
  --endpoint kunashiri \
  --disable-thinking

Add --with-vitis to run the optional host Vivado/Vitis synthesis step after functional evaluation.

Add an optional host Vivado/Vitis RTL synthesis estimate:

scripts/evaluate_candidate.sh \
  --task hoge_externalproduct_ntt_1024_p64 \
  --with-vitis

The Vitis path synthesizes the task's Verilog top out-of-context with Vivado, using the AutoNTT-style default target of xcu280-fsvh2892-2L-e and a 4.0 ns clock. Override these with --vitis-part, --vitis-clock-period, --vitis-clock-port, --vitis-jobs, --vivado-bin, or --xilinx-settings. When /home/opt/xilinx/Vitis/2023.2/settings64.sh exists, it is sourced by default before host synthesis.

When only Vitis/Vivado is installed on the host and the other build tools should come from Apptainer, use the split runner:

scripts/evaluate_with_apptainer_and_vitis.sh \
  --task hoge_externalproduct_ntt_1024_p64 \
  --with-yosys \
  --sif llm-ntt.sif

Refresh all extracted-RTL reference JSON files with host Vitis synthesis:

scripts/evaluate_baselines_with_vitis.sh --sif llm-ntt.sif

Test Targets

yata_raintt_reference_test: compares streamed YATA INTT and NTT against raintt::TwistINTT/raintt::TwistNTT with USE_COMPRESS. INTT input lane l at cycle c carries coefficient l * 8 + c; NTT output lane l at cycle c corresponds to coefficient l * 8 + c.
hoge_streaming_reference_test: drives HOGE INTTWrap and compares against cuHEpp::TwistINTT.
hoge_externalproduct_ntt_reference_test: drives HOGE ExternalProductWrap and compares the final 32-bit torus output against TFHEpp ExternalProduct<lvl1param>, whose final boundary is TwistNTT.
hoge_nttid_identity_test: drives HOGE NTTid and checks that the combined INTT/NTT pipeline returns the original polynomial modulo P.

The HOGE forward NTTWrap manifest, hoge_streaming_ntt_1024_p64, is a lint-only tier0 interface task. Use hoge_externalproduct_ntt_1024_p64 for HOGE forward NTT arithmetic and latency comparisons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-NTT Examples

Contents

Native Run

Fresh-Clone HLS Reproduction

Apptainer Run

Manual Steps

Test Targets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
apptainer		apptainer
baselines/extracted-rtl		baselines/extracted-rtl
docs		docs
examples/autontt		examples/autontt
licenses		licenses
scripts		scripts
tasks		tasks
tests		tests
third_party		third_party
variants		variants
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LLM-NTT Examples

Contents

Native Run

Fresh-Clone HLS Reproduction

Apptainer Run

Manual Steps

Test Targets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages