libnvmf: Add the NVMe-oF exclusion list#3523
Conversation
8bf982c to
c9bf398
Compare
|
@igaw — Thanks for the review I've addressed all the comments and force-pushed. Note the history was rewritten (commits re-split per your suggestions), so the series is now 8 commits; here's what changed:
Every commit builds on its own and the unit tests pass. |
|
alright, I am done with this round of review. |
c9bf398 to
eb2cabb
Compare
|
@igaw - Hey Daniel. It was a long day... I've pushed an update that addresses the last round of reviews. Everything is folded into the existing 8-commit series (fixups squashed into the commit that introduced each piece), and every commit still builds on its own. Here's a summary of the substantive changes and the decisions behind them. HostID is now part of the TIDAfter re-reading the NVMe Base Specification (revision 2.3, section 6.3, "Connect Command"), the transport ID now carries Config keys match the
|
When the global context became mandatory, three libnvme_msg() call sites in the no-netdb / no-getifaddrs fallback stubs kept passing NULL. __libnvme_msg() dereferences it (struct libnvme_log *l = &ctx->log), so each stub crashes in any build without netdb (e.g. musl built with -U_GNU_SOURCE). No test exercised those stubs until the exclusion-list unit tests started calling libnvme_ipaddrs_eq(). Spin up a throwaway context for the diagnostic instead of passing NULL. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
This is a preparation patch for the NVMe-oF exclusion list added in a following commit. It introduces utility helpers shared between the registry and the exclusion list. registry.c keeps private helpers to create directories (mkdir -p), write a file atomically via a cloexec temp file, and validate a test-only directory override. The exclusion list needs the same primitives, so move them into util-fabrics.c as internal libnvmf_ helpers and convert registry.c to use them. No functional change: the registry behaviour and unit tests are unchanged. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Add libnvmf_tid, the owned, hashable transport-ID that uniquely identifies a full host-to-controller path (an NVMe-oF association). It is the identity the exclusion list matches on, and the basis other fabrics features build their per-connection identity on. Split out from the exclusion-list commit so it can be reviewed on its own. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Add a system-wide NVMe-oF exclusion list (/etc/nvme/exclusions/*.conf) naming controllers that orchestrators must not auto-connect. Enforcement is cooperative; the match is minimal -- only the fields present in an entry are compared. The library API (libnvmf_exclusion_*) is built on a new transport-ID type (libnvmf_tid_*) that canonicalizes a connection's identifying tuple and derives a stable hash from it. An "nvme exclusion" plugin provides create/delete/list/add/remove/edit, and SWIG bindings expose the lists, entries and match query to Python. The exclusion list and the transport ID join the ownership registry as the building blocks that let independent NVMe-oF orchestrators (e.g. nvme-discoverd, nvme-stas) coexist: the registry records who owns a controller, the exclusion list lets an admin keep controllers out of every orchestrator's reach, and the TID gives them all one stable identity for a connection. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Add the `nvme exclusion` command group (list/add/remove/create/delete/ edit) as the CRUD front-end to the exclusion API. Split into its own patch so the library and the management tooling land separately. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Add a -x/--exclude flag to "nvme disconnect" that records a matching exclusion entry before tearing the controller (or subsystem) down, so an orchestrator sees the exclusion before the removal event fires and does not immediately reconnect. The entry is built inside libnvme (libnvmf_exclusion_add_ctrl for a device, libnvmf_exclusion_add_nqn for an NQN); the CLI keeps no knowledge of the on-disk entry format. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Add an overview of the NVMe-oF exclusion list in libnvme (EXCLUSIONS.md) -- the on-disk format, the field keys, the minimal-match semantics, and how it complements the ownership registry as a cooperative layer for orchestrator coexistence -- together with man pages for the six "nvme exclusion" subcommands and the new "nvme disconnect --exclude" flag. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
The hand-written markdown design docs were cluttering the top-level libnvme/ listing. Move REGISTRY.md into a libnvme/design/ subdirectory, kept distinct from libnvme/doc/ (the sphinx / read-the-docs source), to be read directly on GitHub. EXCLUSIONS.md lands there in the same series. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
The exclusion list governs auto-connecting orchestrators, not an explicit "nvme connect", so connect must never be blocked by it. But an operator who hand-connects a controller they previously excluded is overriding their own opt-out, often unknowingly. Under --verbose, consult the list and print a note when the target matches, making the override visible without changing behaviour. Document the courtesy in EXCLUSIONS.md. Signed-off-by: Martin Belanger <martin.belanger@dell.com>
eb2cabb to
3e4e9a6
Compare
|
@igaw - While working on this PR, I uncovered an old regression affecting musl and muon builds when The three stub functions in This issue had gone unnoticed because we didn't have any tests exercising those stubs. With the new TID tests I added, those code paths are now being hit. So, long story short, I had to add a preliminary fix for this regression ahead of the rest of my changes. As a result, the PR now contains 9 commits. |
Summary
This series adds a system-wide NVMe-oF exclusion list: a persistent, administrator-authored set of files under
/etc/nvme/exclusions/naming controllers that no orchestrator may auto-connect. It is a small, cooperative coordination layer — the administrator's opt-out — and the natural companion to the ownership registry that landed earlier.The two are mirror images: the registry records who owns a controller and guards against accidental disconnect (so
disconnect-alldoes not tear down a connection some component depends on); the exclusion list guards against accidental connect (so an orchestrator does not auto-connect a controller the administrator wants left alone). Together with the transport ID introduced here, they are the building blocks that let independent NVMe-oF orchestrators — the initramfs (NBFT/FC boot), a human runningconnect-all,nvme-discoverd,nvme-stas— coexist on one host.Why an exclusion list (and not just editing a config file)
The motivating cases are connections whose intent comes from a source the administrator cannot edit:
For all of these the exclusion list is the only local lever: the one place an administrator can say "not this one", regardless of where the connection request originated. Like the registry, it is cooperative, not enforcement — every orchestrator runs as root and could connect anything; the list simply lets well-behaved tools refrain by agreement.
What's in the series
libnvme: add shared fabrics helpers in preparation for exclusion list— a preparation refactor: extract the privatemkdir -p, atomic-cloexec-tempfile, and test-directory-validation helpers fromregistry.cintoutil-fabrics.cas internallibnvmf_helpers, and convertregistry.cto use them. No functional change.libnvme: add the NVMe-oF exclusion list— the feature: thelibnvmf_exclusion_*API inexclusion.c, built on a new transport-ID type (libnvmf_tid_*intid.c) that canonicalizes a connection's identifying tuple and derives a stable hash from it; annvme exclusionplugin (create/delete/edit/list/add/remove); and SWIG bindings exposing lists/entries/match to Python.nvme-cli/fabrics: add --exclude to nvme disconnect— a-x/--excludeflag onnvme disconnectthat records a matching exclusion entry before tearing a controller (or subsystem) down, so an orchestrator sees the exclusion in place before the removal event fires and does not immediately reconnect. The entry is built inside libnvme (libnvmf_exclusion_add_ctrlfor a device,libnvmf_exclusion_add_nqnfor an NQN); the CLI keeps no knowledge of the on-disk entry format.doc: add NVMe-oF exclusion list documentation— an overview in libnvme (EXCLUSIONS.md, modelled onREGISTRY.md) covering the on-disk format, the field keys, the minimal-match semantics, and the relationship to the registry; man pages for the sixnvme exclusionsubcommands; and thenvme disconnect --excludeflag.Design notes
.conffiles, each holdingexclusion = key=value;key=valuelines (#comments). By conventionuser.confis whatnvme disconnect --excludeappends to. World-readable, root-writable; writes are atomic (mkstemp→fsync→rename).tid.h), the same building block the registry and discoverd use.nvme connect <args>ornvme disconnect <device>, which are single, targeted human actions.Testing
libnvme - tidandlibnvme - exclusion(run undermeson test, using theNVME_EXCLUSION_DIR/tmpoverride).meson test: all green, 2 expected-fail unchanged).Follow-ups (not in this series)
nvme connect-allshould honor the exclusion list.connect-allis an orchestration command (it auto-connects DLP/NBFT/config entries), so it should skip excluded controllers. The intended placement is libnvme's discovery-connect path (covering all ofconnect-all's sources at once, while leaving the targetednvme connectpath unaffected). Deferred to its own PR after this series lands.host-traddr/host_traddr,host-iface/host_iface) and bothnqn/subsysnqn. Whether to settle on one canonical form is still under discussion; the docs describe current behavior.