SchemaCommandLine

SchemaCommandLine is DataSchemer’s schema-driven command-line engine.

It combines:

  • argparse parsing (including optional argcomplete tab-completion)
  • YAML input files (optional)
  • an optional required positional file (per-command metadata)
  • SchemaProjector validation and typing
  • optional object instantiation (target_class)
  • and an optional print-attribute facility for introspecting results

Where SchemaProjector is the typing/validation core, SchemaCommandLine is the CLI orchestration layer.


High-level flow

When you construct a SchemaCommandLine, it performs all work immediately:

  1. Resolve schema form/name (normalize_schema_inputs)
  2. Resolve print_attribute configuration (with inheritance)
  3. Build schema variables (including inheritance/copy/update/delete)
  4. Build an argparse parser from schema variables
  5. Parse CLI args (strictly; allow_abbrev=False)
  6. Optionally load YAML input files
  7. Optionally read a required positional file (and optionally parse it as YAML)
  8. Merge all data sources into a single raw input mapping
  9. Validate and type the data via SchemaProjector
  10. Optionally instantiate target_class
  11. Run _operations() (override point for custom logic)
  12. Optionally print attributes requested via --print-attribute

This “do everything in __init__” approach keeps the class simple to embed: construct it once and you either get a successful run or a structured user error.


Basic example

The typical usage pattern is to subclass and set a few class attributes:

from data_schemer.schema_command_line import SchemaCommandLine

class MyCLI(SchemaCommandLine):
  pass

schema = {
  "variables": {
    "a": {"type": "int"},
    "b": {"type": "float"},
  }
}

# Equivalent of: MyCLI(schema_definitions=schema, argv=[...])
MyCLI(schema, argv=["--a", "1", "--b", "2.5"])

In practice, projects usually supply schema_definitions loaded from YAML, and set a target_class so a typed object is available in _operations().


Entry point: main()

SchemaCommandLine.main(..., debug: bool = False) -> int

main() is a convenience wrapper that:

  • returns a process exit code (0 on success)
  • catches DSUserError and prints a friendly message to stderr
  • suppresses tracebacks unless debug=True

This is the recommended entry point for console_scripts:

if __name__ == "__main__":
  raise SystemExit(MyCLI.main(schema, debug="--debug" in sys.argv))

Schema-driven argparse

SchemaCommandLine builds an argparse parser from the resolved schema variables using:

schema_to_argparse(schema_vars, description=..., masquerade_usage=True)

Key behaviors

  • Strict parsing: allow_abbrev=False to avoid ambiguous/accidental option matches.
  • No argparse-required: argparse does not enforce required arguments; the projector does.
  • Schema is the contract: help, metavar, aliases, choices, bool flags are schema-driven.
  • Improved UX: required variables get a (required) marker in colored help (TTY only).

Variable → CLI option mapping

A schema variable named foo_bar becomes:

  • --foo-bar

Aliases may be provided via alias:

  • if an alias starts with -, it is used as-is (short option)
  • otherwise it becomes --alias-name

Example:

variables:
  iterations:
    type: int
    alias: ["-n", "num_iterations"]

This yields options:

  • -n
  • --num-iterations
  • --iterations

Data sources and precedence

SchemaCommandLine can merge values from up to three sources:

  1. YAML input files (optional, via an “input files” variable in the schema)
  2. required_file (optional positional argument configured by schema metadata)
  3. CLI options (always available)

Precedence is:

  1. YAML input files (lowest)
  2. required_file schema data (middle)
  3. CLI options (highest)

This is implemented in _merge_sources():

raw_data = dict(self._input_file_data)
raw_data.update(self._required_file_schema_data)
raw_data.update(self._cli_data)

So a user can keep defaults in YAML and override with CLI flags.


YAML input files

YAML input files are enabled by defining a schema variable (default name is input_files_tag, constructor default is "input_files"):

  • variable must exist in schema vars
  • value is expected to be a list of file paths (often via string-list)
  • each YAML file must parse to a mapping/dict
  • merged sequentially (later files override earlier)

stdin can be used by passing "-" as a YAML input file path (meaning read YAML from stdin).

Important restriction: you cannot use "-" for both required_file and YAML input files in the same invocation.


required_file metadata

A command may declare a required positional file via schema metadata:

required_file:
  enabled: true
  metavar: FILE
  help: Required input file (use '-' for stdin).
  apply_as_schema_data: false
  read_mode: path      # path|text|binary
  stdin_ok: true

Behavior

  • If enabled: true, argparse adds a positional argument required_file.
  • How the file is consumed depends on read_mode:
    • path: do not read content (only store the path); required_file_content is None
    • text: read text content into required_file_content
    • binary: read bytes into required_file_content

apply_as_schema_data

If apply_as_schema_data: true, the required file is always read as text and parsed as YAML. The resulting mapping is merged into schema input data at the “required_file” precedence level.

This is useful for commands where the primary input is a YAML block but you want it as a required positional rather than --input-file.

Accessors

  • required_file_path
  • required_file_content

Unknown options and suggestions

SchemaCommandLine parses with parse_known_args() to detect unknown options and provide better errors.

  • Unknown options beginning with - raise DSUnknownNameError with “did you mean” suggestions.
  • Unexpected positional arguments raise DSUserError.

Suggestions are computed using close matches over normalized option names (including aliases).

This is a major UX improvement over default argparse errors.


target_class and object instantiation

If target_class is provided, SchemaCommandLine will:

  1. validate/type inputs using SchemaProjector
  2. call SchemaProjector.get_instance(target_class)
  3. expose the instantiated object via target_obj

Constructor kwargs are filtered by the target class __init__ signature (unknown kwargs ignored).

This makes schemas usable both for:

  • configuration validation
  • object construction

Extending behavior: _operations()

Override _operations() in subclasses to implement custom logic.

At the time _operations() runs:

  • self.data is available (typed dict)
  • self.target_obj may be available (if target_class provided)
  • required_file state is available

A common pattern is:

  • compute derived quantities
  • write output files
  • call domain-specific libraries

print_attribute is a special schema feature that enables a built-in CLI option:

  • --print-attribute ...

This option lets users request printing one or more @property attributes from the instantiated target object.

It is designed for:

  • interactive exploration
  • debugging / inspection
  • reproducible output (printed in YAML-style using render_mapping)

Requirements

  • print-attribute requires a target_class
  • print-attribute is enabled and configured by schema descriptor configuration (not per-variable)

When enabled, SchemaCommandLine injects a reserved variable into the schema:

  • variable name: print_attribute (reserved; users must not define it)
  • type: string-list
  • optional: true

This injection ensures argparse sees a --print-attribute option without requiring you to bake it into every schema by hand.

Where configuration comes from

SchemaCommandLine resolves print-attribute configuration via:

print_attribute = get_schema_print_attribute(schema_name, schema_definitions)

This supports inheritance and merging rules defined by the schema resolver.

The resolved value may be:

  • False / None: disabled
  • True: enabled, auto mode
  • a dict: enabled, with advanced configuration

After resolution, SchemaCommandLine normalizes the configuration into one of two modes:

Auto mode

Auto mode derives choices by introspecting the target class:

  • all public @property names (including inherited)
  • excluding private names starting with _

Those internal property names are then mapped to external names presented on the CLI.

Manual mode

Manual mode uses an explicit list of external choices provided by schema configuration:

  • no class introspection is used to produce the list
  • values are still mapped to internal attribute names before reading from the object

Manual mode is selected when the resolved configuration dict contains a choices key and it is not null.


This is the most important conceptual point:

  • The CLI accepts and displays external names only.
  • The underlying object is accessed using internal attribute names.

DataSchemer supports two mapping systems that may both apply:

  1. Schema variable code_alias (general DataSchemer feature)
  2. print_attribute.alias (specific to print-attribute)

Both contribute to mapping external → internal attribute names for print-attribute.

Schema code_alias interaction

Schema variable definitions may rename how constructor arguments map to the object:

variables:
  a:
    type: int
    code_alias: x

For print-attribute, SchemaCommandLine builds alias maps from schema vars:

  • internal → external
  • external → internal

and merges them with print-attribute’s explicit alias mapping (below).

In dict form, print-attribute may define:

print_attribute:
  alias:
    external_name: internal_name

This affects only print-attribute resolution (not object construction).

Disjointness requirement

These two alias maps must be disjoint (no overlapping keys), otherwise it becomes ambiguous. The implementation validates disjointness using merge_disjoint_maps(...) and raises DSUserError if they overlap.


In dict form, print-attribute can constrain and control the allowed --print-attribute values.

Manual choices

print_attribute:
  choices: ["energy", "volume"]

This:

  • forces manual mode
  • restricts CLI candidates strictly to those external spellings
  • drives tab-completion candidates (when argcomplete is installed)

If choices: null, then:

  • the configuration remains enabled
  • but it stays in auto mode
  • and (crucially) exclude is suppressed because the choices key is present (see exclude semantics below)

This “choices key present” rule is intentional: it lets inheritance signal “explicit configuration” even when the final list is not specified.

Runtime injection of choices for argparse

To support argparse choices and tab-completion, SchemaCommandLine dynamically injects the computed choices into the injected print_attribute variable only at parser construction time.


In dict form, print-attribute may define an exclude list:

print_attribute:
  exclude: ["debug", "internal_state"]

Exclude is applied only in true auto mode, meaning:

  • the effective dict does not contain a choices key (including inherited configs)

If choices is present (even as null), exclude is ignored by design.

Exclude works on external spellings only:

  • if you exclude "foo", it will remove both the internal and external names that match "foo" from the auto-derived list

Consistency between choices and ext2int

In auto mode, exclude removes entries from:

  • the completion/validation choices list
  • the external→internal mapping (ext2int)

This prevents excluded names from reappearing via alias mappings.


Because schema-level print_attribute is inherited and merged, treat it like a small “policy” object:

  • A base schema can enable print-attribute broadly.
  • A derived schema can:
    • add aliases
    • extend choices
    • add exclusions (in auto mode)
    • or disable print-attribute entirely (false)

The resolver and normalizer together ensure that:

  • the CLI only ever exposes external spellings
  • printing uses render_mapping({external_name: value})
  • and mapping is deterministic.

How print-attribute output looks

When --print-attribute is used, SchemaCommandLine prints each requested attribute in YAML style:

energy: -12.345
volume: 60.0

Internally, it prints one mapping per attribute, so output remains stream-friendly.

If the external name cannot be resolved to an attribute on the object, a DSUserError is raised with details (including internal name).


Tab completion with argcomplete

If argcomplete is installed, SchemaCommandLine enables completion via:

argcomplete.autocomplete(parser)

For --print-attribute, completion candidates are the computed external choices.

The canonical helper for computing them is:

  • compute_print_attribute_choices(...)

Downstream tools (like PM) should call this helper rather than duplicating logic.


Summary

SchemaCommandLine provides a practical, batteries-included way to build CLIs from schemas:

  • schema → argparse options (+ help/aliases/choices)
  • strict parsing + helpful suggestions
  • merge YAML + required_file + CLI with clear precedence
  • validate/type using SchemaProjector
  • optionally instantiate an object
  • optionally print introspected attributes using a carefully designed external/internal mapping system

The print-attribute feature is intentionally rich because it lives at the boundary between schema naming, object naming, and user-visible CLI affordances.


Last modified January 20, 2026: updated schema command line (6fbd090)