SchemaCommandLine

SchemaCommandLine is DataSchemer’s schema-driven command-line engine.

It combines:

  • argparse parsing (including optional argcomplete tab-completion)
  • YAML input files (optional)
  • an optional required positional file (per-command metadata)
  • SchemaProjector validation and typing
  • optional object instantiation (target_class)
  • and an optional print-attribute facility for introspecting results

Where SchemaProjector is the typing/validation core, SchemaCommandLine is the CLI orchestration layer.


High-level flow

When you construct a SchemaCommandLine, it performs all work immediately:

  1. Resolve schema form/name (normalize_schema_inputs)
  2. Resolve print_attribute configuration (with inheritance)
  3. Build schema variables (including inheritance/copy/update/delete)
  4. Build an argparse parser from schema variables
  5. Parse CLI args (strictly; allow_abbrev=False)
  6. Optionally load YAML input files
  7. Optionally read a required positional file (and optionally parse it as YAML)
  8. Merge all data sources into a single raw input mapping
  9. Validate and type the data via SchemaProjector
  10. Optionally instantiate target_class
  11. Run _operations() (override point for custom logic)
  12. Optionally print attributes requested via --print-attribute

This “do everything in __init__” approach keeps the class simple to embed: construct it once and you either get a successful run or a structured user error.


Basic example

The typical usage pattern is to subclass and set a few class attributes:

from data_schemer.schema_command_line import SchemaCommandLine

class MyCLI(SchemaCommandLine):
  pass

schema = {
  "variables": {
    "a": {"type": "int"},
    "b": {"type": "float"},
  }
}

# Equivalent of: MyCLI(schema_definitions=schema, argv=[...])
MyCLI(schema, argv=["--a", "1", "--b", "2.5"])

In practice, projects usually supply schema_definitions loaded from YAML, and set a target_class so a typed object is available in _operations().


Entry point: main()

SchemaCommandLine.main(..., debug: bool = False) -> int

main() is a convenience wrapper that:

  • returns a process exit code (0 on success)
  • catches DSUserError and prints a friendly message to stderr
  • suppresses tracebacks unless debug=True

This is the recommended entry point for console_scripts:

if __name__ == "__main__":
  raise SystemExit(MyCLI.main(schema, debug="--debug" in sys.argv))

Schema-driven argparse

SchemaCommandLine builds an argparse parser from the resolved schema variables using:

schema_to_argparse(schema_vars, description=..., masquerade_usage=True)

Key behaviors

  • Strict parsing: allow_abbrev=False to avoid ambiguous/accidental option matches.
  • No argparse-required: argparse does not enforce required arguments; the projector does.
  • Schema is the contract: help, metavar, aliases, choices, bool flags are schema-driven.
  • Improved UX: required variables get a (required) marker in colored help (TTY only).

Variable → CLI option mapping

A schema variable named foo_bar becomes:

  • --foo-bar

Aliases may be provided via alias:

  • if an alias starts with -, it is used as-is (short option)
  • otherwise it becomes --alias-name

Example:

variables:
  iterations:
    type: int
    alias: ["-n", "num_iterations"]

This yields options:

  • -n
  • --num-iterations
  • --iterations

Child-schemas

Child-schemas let a schema define a nested command grammar using bare tokens (not flags), with schema-driven options available at each level. The child-schema feature is designed to be:

  • deterministic (schema-declared ordering; no argparse subparsers)
  • strict (unknown tokens/options are errors)
  • nestable (grandchild-schemas and deeper)
  • completion-friendly (context-aware token/option completion when argcomplete is installed)

This section documents the current implementation behavior.

Declaring child-schemas

A schema may declare a list of child-schemas using child_schemas:

root:
  variables:
    # root options...
  child_schemas: [post_process, export]

post_process:
  description: Post-processing controls
  variables:
    level:
      type: int
      optional: true

export:
  description: Export controls
  variables:
    format:
      type: string
      optional: true
      choices: [json, yaml]

Notes:

  • child_schemas may be a string or a list of strings.
  • Child-schemas accumulate through inheritance (bases first, then child), with duplicates removed while preserving first occurrence order.
  • Child-schemas may themselves declare child_schemas, allowing arbitrary nesting depth.
  • Child-schemas must not define required_file (enforced by SchemaProjector schema validation).

CLI shape: bare tokens + options

Child-schemas appear on the CLI as bare tokens in kebab-case:

prog post-process --level 2
prog export --format yaml

Tokens are schema names rendered as:

  • schema name (snake_case): post_process
  • CLI token (kebab-case): post-process

Nested child-schemas (grandchild-schemas)

Example nesting:

root:
  child_schemas: [post_process]

post_process:
  child_schemas: [advanced]
  variables:
    level:
      type: int
      optional: true

advanced:
  child_schemas: [knobs]
  variables:
    adv:
      type: int
      optional: true

knobs:
  variables:
    k:
      type: int
      optional: true

CLI:

prog post-process --level 1 advanced --adv 2 knobs --k 3

child_schema_mode: exclusive vs inclusive

Each schema may set:

child_schema_mode: exclusive   # default
# or
child_schema_mode: inclusive

Meaning:

  • exclusive (default): once you descend into a child schema, you may only enter its descendants next. You cannot jump to a sibling token of the parent while inside the child.
  • inclusive: enables a one-level sibling jump at that schema level: while inside a child, you may jump to another child of the parent (the parent must be inclusive).

This is intentionally limited to one level; the parser does not walk up multiple ancestors to search for siblings.

Error messages for invalid tokens

When a user types a token that exists somewhere in the schema tree but is invalid in the current context, SchemaCommandLine raises a DSUserError that is explicit about categories:

  • what schema you were configuring,
  • that the invalid thing is a command token (not an option),
  • what you are allowed to do next (set options vs enter child-commands).

Example form:

Invalid command 'knobs' while configuring 'post_process'.

Here you may:
  • set options for 'post_process' (e.g. --level)
  • enter child-commands of 'post_process': advanced

Help listing for child-schemas

When child-schemas are present, help output includes a dedicated group listing tokens (not flags). The group label is configurable per schema via:

child_schema_label: Commands

Each listed token can also show a short description if the referenced schema provides a description field.

Tab completion (argcomplete)

When argcomplete is installed and active:

  • At a given point in the command line, completion suggests either:
    • child-schema tokens valid in the current context, or
    • options for the current context schema.
  • Completion is context-aware and supports arbitrary nesting depth.

Important: child-schemas are implemented without argparse subparsers. Completion uses a completion-only parser plus a grammar-aware context resolver, so you still get clean help output and deterministic behavior.

Data merging with child-schemas

Child-schema data may come from YAML input files and from CLI segments.

When both provide values for the same child-schema block:

  • YAML provides the baseline mapping for that block
  • CLI overrides keys inside that block

This behavior is implemented by merging nested mappings for each child-schema name before SchemaProjector validation/typing.


Data sources and precedence

SchemaCommandLine can merge values from up to two sources:

  1. YAML input files (optional, via an “input files” variable in the schema)
  2. CLI options (always available)

Precedence is:

  1. YAML input files (lowest)
  2. CLI options (highest)

This is implemented in _merge_sources():

raw_data = dict(self._input_file_data)
raw_data.update(self._cli_data)

So a user can keep defaults in YAML and override with CLI flags.


YAML input files

YAML input files are enabled by defining a schema variable (default name is input_files_tag, constructor default is "input_files"):

  • variable must exist in schema vars
  • value is expected to be a list of file paths (often via string-list)
  • each YAML file must parse to a mapping/dict
  • merged sequentially (later files override earlier)

stdin can be used by passing "-" as a YAML input file path (meaning read YAML from stdin).

Important restriction: you cannot use "-" for both required_file and YAML input files in the same invocation.


required_file metadata

A command may declare a required positional file via schema metadata:

required_file:
  enabled: true
  metavar: FILE
  help: Required input file (use '-' for stdin).
  read_mode: path      # path|text|binary
  stdin_ok: true

Behavior

  • If enabled: true, argparse adds a positional argument required_file.
  • How the file is consumed depends on read_mode:
    • path: do not read content (only store the path); required_file_content is None
    • text: read text content into required_file_content
    • binary: read bytes into required_file_content

apply_as_schema_data

If apply_as_schema_data: true, the required file is always read as text and parsed as YAML. The resulting mapping is merged into schema input data at the “required_file” precedence level.

This is useful for commands where the primary input is a YAML block but you want it as a required positional rather than --input-file.

Accessors

  • required_file_path
  • required_file_content

Unknown options and suggestions

SchemaCommandLine parses with parse_known_args() to detect unknown options and provide better errors.

  • Unknown options beginning with - raise DSUnknownNameError with “did you mean” suggestions.
  • Unexpected positional arguments raise DSUserError.

Suggestions are computed using close matches over normalized option names (including aliases).

This is a major UX improvement over default argparse errors.


target_class and object instantiation

If target_class is provided, SchemaCommandLine will:

  1. validate/type inputs using SchemaProjector
  2. call SchemaProjector.get_instance(target_class)
  3. expose the instantiated object via target_obj

Constructor kwargs are filtered by the target class __init__ signature (unknown kwargs ignored).

This makes schemas usable both for:

  • configuration validation
  • object construction

Extending behavior: _operations()

Override _operations() in subclasses to implement custom logic.

At the time _operations() runs:

  • self.data is available (typed dict)
  • self.target_obj may be available (if target_class provided)
  • required_file state is available

A common pattern is:

  • compute derived quantities
  • write output files
  • call domain-specific libraries

print_attribute is a special schema feature that enables a built-in CLI option:

  • --print-attribute ...

This option lets users request printing one or more @property attributes from the instantiated target object.

It is designed for:

  • interactive exploration
  • debugging / inspection
  • reproducible output (printed in YAML-style using render_mapping)

Requirements

  • print-attribute requires a target_class
  • print-attribute is enabled and configured by schema descriptor configuration (not per-variable)

When enabled, SchemaCommandLine injects a reserved variable into the schema:

  • variable name: print_attribute (reserved; users must not define it)
  • type: string-list
  • optional: true

This injection ensures argparse sees a --print-attribute option without requiring you to bake it into every schema by hand.

Where configuration comes from

SchemaCommandLine resolves print-attribute configuration via:

print_attribute = get_schema_print_attribute(schema_name, schema_definitions)

This supports inheritance and merging rules defined by the schema resolver.

The resolved value may be:

  • False / None: disabled
  • True: enabled, auto mode
  • a dict: enabled, with advanced configuration

After resolution, SchemaCommandLine normalizes the configuration into one of two modes:

Auto mode

Auto mode derives choices by introspecting the target class:

  • all public @property names (including inherited)
  • excluding private names starting with _

Those internal property names are then mapped to external names presented on the CLI.

Manual mode

Manual mode uses an explicit list of external choices provided by schema configuration:

  • no class introspection is used to produce the list
  • values are still mapped to internal attribute names before reading from the object

Manual mode is selected when the resolved configuration dict contains a choices key and it is not null.


This is the most important conceptual point:

  • The CLI accepts and displays external names only.
  • The underlying object is accessed using internal attribute names.

DataSchemer supports two mapping systems that may both apply:

  1. Schema variable code_alias (general DataSchemer feature)
  2. print_attribute.code_alias (specific to print-attribute)

Both contribute to mapping external → internal attribute names for print-attribute.

Schema code_alias interaction

Schema variable definitions may rename how constructor arguments map to the object:

variables:
  a:
    type: int
    code_alias: x

For print-attribute, SchemaCommandLine builds alias maps from schema vars:

  • internal → external
  • external → internal

and merges them with print-attribute’s explicit code-alias mapping (below).

In dict form, print-attribute may define:

print_attribute:
  code_alias:
    external_name: internal_name

This affects only print-attribute resolution (not object construction).

Disjointness requirement

These two alias maps must be disjoint (no overlapping keys), otherwise it becomes ambiguous. The implementation validates disjointness using merge_disjoint_maps(...) and raises DSUserError if they overlap.


In dict form, print-attribute can constrain and control the allowed --print-attribute values.

Manual choices

print_attribute:
  choices: ["energy", "volume"]

This:

  • forces manual mode
  • restricts CLI candidates strictly to those external spellings
  • drives tab-completion candidates (when argcomplete is installed)

If choices: null, then:

  • the configuration remains enabled
  • but it stays in auto mode
  • and (crucially) exclude is suppressed because the choices key is present (see exclude semantics below)

This “choices key present” rule is intentional: it lets inheritance signal “explicit configuration” even when the final list is not specified.

Runtime injection of choices for argparse

To support argparse choices and tab-completion, SchemaCommandLine dynamically injects the computed choices into the injected print_attribute variable only at parser construction time.


In dict form, print-attribute may define an exclude list:

print_attribute:
  exclude: ["debug", "internal_state"]

Exclude is applied only in true auto mode, meaning:

  • the effective dict does not contain a choices key (including inherited configs)

If choices is present (even as null), exclude is ignored by design.

Exclude works on external spellings only:

  • if you exclude "foo", it will remove both the internal and external names that match "foo" from the auto-derived list

Consistency between choices and ext2int

In auto mode, exclude removes entries from:

  • the completion/validation choices list
  • the external→internal mapping (ext2int)

This prevents excluded names from reappearing via alias mappings.


In dict form, print-attribute may define an omit_key flag:

print_attribute:
  omit_key: true

If omit_key is true, SchemaCommandLine prints the raw attribute value (not as name: value mapping).

If omit_key is false (default), output is rendered using render_mapping({external_name: value}, ...).


SchemaCommandLine also supports optional output-format controls for print-attribute:

  • precision_from
  • array_style_from

These are optional keys inside the print_attribute config dict. Each key names a schema variable. If the user supplies that variable, SchemaCommandLine uses it to control how print-attribute output is rendered.

precision_from

print_attribute:
  precision_from: precision

variables:
  precision:
    type: int
    optional: true
    help: Output precision for print-attribute

Behavior:

  • If the user supplies precision, SchemaCommandLine validates it as an integer and updates the instance precision.
  • If the user does not supply it, SchemaCommandLine keeps the constructor precision (default 8).

array_style_from

print_attribute:
  array_style_from: array_style

variables:
  array_style:
    type: string
    optional: true
    choices: ["clean", "bare"]
    help: Array style for print-attribute output

Behavior:

  • If the user supplies array_style, SchemaCommandLine passes it to render_mapping(..., array_style=...) for print-attribute output.
  • If the user does not supply it, SchemaCommandLine uses the default array-style behavior of render_mapping.

Because schema-level print_attribute is inherited and merged, treat it like a small “policy” object:

  • A base schema can enable print-attribute broadly.
  • A derived schema can:
    • add code-alias mappings
    • extend choices
    • add exclusions (in auto mode)
    • add output-format controls (precision_from, array_style_from)
    • or disable print-attribute entirely (false)

The resolver and normalizer together ensure that:

  • the CLI only ever exposes external spellings
  • printing uses render_mapping({external_name: value}, ...)
  • and mapping is deterministic.

How print-attribute output looks

When --print-attribute is used, SchemaCommandLine prints each requested attribute in YAML style:

energy: -12.345
volume: 60.0

Internally, it prints one mapping per attribute, so output remains stream-friendly.

If the external name cannot be resolved to an attribute on the object, a DSUserError is raised with details (including internal name).


Tab completion with argcomplete

If argcomplete is installed, SchemaCommandLine enables completion via:

argcomplete.autocomplete(parser)

For --print-attribute, completion candidates are the computed external choices.

The canonical helper for computing them is:

  • compute_print_attribute_choices(...)

Downstream tools (like PM) should call this helper rather than duplicating logic.


Summary

SchemaCommandLine provides a practical, batteries-included way to build CLIs from schemas:

  • schema → argparse options (+ help/aliases/choices)
  • strict parsing + helpful suggestions
  • merge YAML + required_file + CLI with clear precedence
  • validate/type using SchemaProjector
  • optionally instantiate an object
  • optionally print introspected attributes using a carefully designed external/internal mapping system

The print-attribute feature is intentionally rich because it lives at the boundary between schema naming, object naming, and user-visible CLI affordances.