SchemaCommandLine

SchemaCommandLine is DataSchemer’s schema-driven command-line engine.

It combines:

argparse parsing (including optional argcomplete tab-completion)
YAML input files (optional)
an optional required positional file (per-command metadata)
SchemaProjector validation and typing
optional object instantiation (target_class)
and an optional print-attribute facility for introspecting results

Where SchemaProjector is the typing/validation core, SchemaCommandLine is the CLI orchestration layer.

High-level flow

When you construct a SchemaCommandLine, it performs all work immediately:

Resolve schema form/name (normalize_schema_inputs)
Resolve print_attribute configuration (with inheritance)
Build schema variables (including inheritance/copy/update/delete)
Build an argparse parser from schema variables
Parse CLI args (strictly; allow_abbrev=False)
Optionally load YAML input files
Optionally read a required positional file (and optionally parse it as YAML)
Merge all data sources into a single raw input mapping
Validate and type the data via SchemaProjector
Optionally instantiate target_class
Run _operations() (override point for custom logic)
Optionally print attributes requested via --print-attribute

This “do everything in __init__” approach keeps the class simple to embed: construct it once and you either get a successful run or a structured user error.

Basic example

The typical usage pattern is to subclass and set a few class attributes:

from data_schemer.schema_command_line import SchemaCommandLine

class MyCLI(SchemaCommandLine):
  pass

schema = {
  "variables": {
    "a": {"type": "int"},
    "b": {"type": "float"},
  }
}

# Equivalent of: MyCLI(schema_definitions=schema, argv=[...])
MyCLI(schema, argv=["--a", "1", "--b", "2.5"])

In practice, projects usually supply schema_definitions loaded from YAML, and set a target_class so a typed object is available in _operations().

Entry point: `main()`

SchemaCommandLine.main(..., debug: bool = False) -> int

main() is a convenience wrapper that:

returns a process exit code (0 on success)
catches DSUserError and prints a friendly message to stderr
suppresses tracebacks unless debug=True

This is the recommended entry point for console_scripts:

if __name__ == "__main__":
  raise SystemExit(MyCLI.main(schema, debug="--debug" in sys.argv))

Schema-driven argparse

SchemaCommandLine builds an argparse parser from the resolved schema variables using:

schema_to_argparse(schema_vars, description=..., masquerade_usage=True)

Key behaviors

Strict parsing: allow_abbrev=False to avoid ambiguous/accidental option matches.
No argparse-required: argparse does not enforce required arguments; the projector does.
Schema is the contract: help, metavar, aliases, choices, bool flags are schema-driven.
Improved UX: required variables get a (required) marker in colored help (TTY only).

Variable → CLI option mapping

A schema variable named foo_bar becomes:

--foo-bar

Aliases may be provided via alias:

if an alias starts with -, it is used as-is (short option)
otherwise it becomes --alias-name

Example:

variables:
  iterations:
    type: int
    alias: ["-n", "num_iterations"]

This yields options:

-n
--num-iterations
--iterations

Child-schemas

Child-schemas let a schema define a nested command grammar using bare tokens (not flags), with schema-driven options available at each level. The child-schema feature is designed to be:

deterministic (schema-declared ordering; no argparse subparsers)
strict (unknown tokens/options are errors)
nestable (grandchild-schemas and deeper)
completion-friendly (context-aware token/option completion when argcomplete is installed)

This section documents the current implementation behavior.

Declaring child-schemas

A schema may declare a list of child-schemas using child_schemas:

root:
  variables:
    # root options...
  child_schemas: [post_process, export]

post_process:
  description: Post-processing controls
  variables:
    level:
      type: int
      optional: true

export:
  description: Export controls
  variables:
    format:
      type: string
      optional: true
      choices: [json, yaml]

Notes:

child_schemas may be a string or a list of strings.
Child-schemas accumulate through inheritance (bases first, then child), with duplicates removed while preserving first occurrence order.
Child-schemas may themselves declare child_schemas, allowing arbitrary nesting depth.
Child-schemas must not define required_file (enforced by SchemaProjector schema validation).

CLI shape: bare tokens + options

Child-schemas appear on the CLI as bare tokens in kebab-case:

prog post-process --level 2
prog export --format yaml

Tokens are schema names rendered as:

schema name (snake_case): post_process
CLI token (kebab-case): post-process

Nested child-schemas (grandchild-schemas)

Example nesting:

root:
  child_schemas: [post_process]

post_process:
  child_schemas: [advanced]
  variables:
    level:
      type: int
      optional: true

advanced:
  child_schemas: [knobs]
  variables:
    adv:
      type: int
      optional: true

knobs:
  variables:
    k:
      type: int
      optional: true

CLI:

prog post-process --level 1 advanced --adv 2 knobs --k 3

`child_schema_mode`: `exclusive` vs `inclusive`

Each schema may set:

child_schema_mode: exclusive   # default
# or
child_schema_mode: inclusive

Meaning:

exclusive (default): once you descend into a child schema, you may only enter its descendants next. You cannot jump to a sibling token of the parent while inside the child.
inclusive: enables a one-level sibling jump at that schema level: while inside a child, you may jump to another child of the parent (the parent must be inclusive).

This is intentionally limited to one level; the parser does not walk up multiple ancestors to search for siblings.

Error messages for invalid tokens

When a user types a token that exists somewhere in the schema tree but is invalid in the current context, SchemaCommandLine raises a DSUserError that is explicit about categories:

what schema you were configuring,
that the invalid thing is a command token (not an option),
what you are allowed to do next (set options vs enter child-commands).

Example form:

Invalid command 'knobs' while configuring 'post_process'.

Here you may:
  • set options for 'post_process' (e.g. --level)
  • enter child-commands of 'post_process': advanced

Help listing for child-schemas

When child-schemas are present, help output includes a dedicated group listing tokens (not flags). The group label is configurable per schema via:

child_schema_label: Commands

Each listed token can also show a short description if the referenced schema provides a description field.

Tab completion (argcomplete)

When argcomplete is installed and active:

At a given point in the command line, completion suggests either:
- child-schema tokens valid in the current context, or
- options for the current context schema.
Completion is context-aware and supports arbitrary nesting depth.

Important: child-schemas are implemented without argparse subparsers. Completion uses a completion-only parser plus a grammar-aware context resolver, so you still get clean help output and deterministic behavior.

Data merging with child-schemas

Child-schema data may come from YAML input files and from CLI segments.

When both provide values for the same child-schema block:

YAML provides the baseline mapping for that block
CLI overrides keys inside that block

This behavior is implemented by merging nested mappings for each child-schema name before SchemaProjector validation/typing.

Data sources and precedence

SchemaCommandLine can merge values from up to two sources:

YAML input files (optional, via an “input files” variable in the schema)
CLI options (always available)

Precedence is:

YAML input files (lowest)
CLI options (highest)

This is implemented in _merge_sources():

raw_data = dict(self._input_file_data)
raw_data.update(self._cli_data)

So a user can keep defaults in YAML and override with CLI flags.

YAML input files

YAML input files are enabled by defining a schema variable (default name is input_files_tag, constructor default is "input_files"):

variable must exist in schema vars
value is expected to be a list of file paths (often via string-list)
each YAML file must parse to a mapping/dict
merged sequentially (later files override earlier)

stdin can be used by passing "-" as a YAML input file path (meaning read YAML from stdin).

Important restriction: you cannot use "-" for both required_file and YAML input files in the same invocation.

required_file metadata

A command may declare a required positional file via schema metadata:

required_file:
  enabled: true
  metavar: FILE
  help: Required input file (use '-' for stdin).
  read_mode: path      # path|text|binary
  stdin_ok: true

Behavior

If enabled: true, argparse adds a positional argument required_file.
How the file is consumed depends on read_mode:
- path: do not read content (only store the path); required_file_content is None
- text: read text content into required_file_content
- binary: read bytes into required_file_content

apply_as_schema_data

If apply_as_schema_data: true, the required file is always read as text and parsed as YAML. The resulting mapping is merged into schema input data at the “required_file” precedence level.

This is useful for commands where the primary input is a YAML block but you want it as a required positional rather than --input-file.

Accessors

required_file_path
required_file_content

Unknown options and suggestions

SchemaCommandLine parses with parse_known_args() to detect unknown options and provide better errors.

Unknown options beginning with - raise DSUnknownNameError with “did you mean” suggestions.
Unexpected positional arguments raise DSUserError.

Suggestions are computed using close matches over normalized option names (including aliases).

This is a major UX improvement over default argparse errors.

target_class and object instantiation

If target_class is provided, SchemaCommandLine will:

validate/type inputs using SchemaProjector
call SchemaProjector.get_instance(target_class)
expose the instantiated object via target_obj

Constructor kwargs are filtered by the target class __init__ signature (unknown kwargs ignored).

This makes schemas usable both for:

configuration validation
object construction

Extending behavior: `_operations()`

Override _operations() in subclasses to implement custom logic.

At the time _operations() runs:

self.data is available (typed dict)
self.target_obj may be available (if target_class provided)
required_file state is available

A common pattern is:

compute derived quantities
write output files
call domain-specific libraries

print-attribute

print_attribute is a special schema feature that enables a built-in CLI option:

--print-attribute ...

This option lets users request printing one or more @property attributes from the instantiated target object.

It is designed for:

interactive exploration
debugging / inspection
reproducible output (printed in YAML-style using render_mapping)

Requirements

print-attribute requires a target_class
print-attribute is enabled and configured by schema descriptor configuration (not per-variable)

When enabled, SchemaCommandLine injects a reserved variable into the schema:

variable name: print_attribute (reserved; users must not define it)
type: string-list
optional: true

This injection ensures argparse sees a --print-attribute option without requiring you to bake it into every schema by hand.

Where configuration comes from

SchemaCommandLine resolves print-attribute configuration via:

print_attribute = get_schema_print_attribute(schema_name, schema_definitions)

This supports inheritance and merging rules defined by the schema resolver.

The resolved value may be:

False / None: disabled
True: enabled, auto mode
a dict: enabled, with advanced configuration

Print-attribute modes

After resolution, SchemaCommandLine normalizes the configuration into one of two modes:

Auto mode

Auto mode derives choices by introspecting the target class:

all public @property names (including inherited)
excluding private names starting with _

Those internal property names are then mapped to external names presented on the CLI.

Manual mode

Manual mode uses an explicit list of external choices provided by schema configuration:

no class introspection is used to produce the list
values are still mapped to internal attribute names before reading from the object

Manual mode is selected when the resolved configuration dict contains a choices key and it is not null.

print-attribute: external vs internal names

This is the most important conceptual point:

The CLI accepts and displays external names only.
The underlying object is accessed using internal attribute names.

DataSchemer supports two mapping systems that may both apply:

Schema variable code_alias (general DataSchemer feature)
print_attribute.code_alias (specific to print-attribute)

Both contribute to mapping external → internal attribute names for print-attribute.

Schema `code_alias` interaction

Schema variable definitions may rename how constructor arguments map to the object:

variables:
  a:
    type: int
    code_alias: x

For print-attribute, SchemaCommandLine builds alias maps from schema vars:

internal → external
external → internal

and merges them with print-attribute’s explicit code-alias mapping (below).

print_attribute.code_alias

In dict form, print-attribute may define:

print_attribute:
  code_alias:
    external_name: internal_name

This affects only print-attribute resolution (not object construction).

Disjointness requirement

These two alias maps must be disjoint (no overlapping keys), otherwise it becomes ambiguous. The implementation validates disjointness using merge_disjoint_maps(...) and raises DSUserError if they overlap.

print-attribute: choices

In dict form, print-attribute can constrain and control the allowed --print-attribute values.

Manual choices

print_attribute:
  choices: ["energy", "volume"]

This:

forces manual mode
restricts CLI candidates strictly to those external spellings
drives tab-completion candidates (when argcomplete is installed)

If choices: null, then:

the configuration remains enabled
but it stays in auto mode
and (crucially) exclude is suppressed because the choices key is present (see exclude semantics below)

This “choices key present” rule is intentional: it lets inheritance signal “explicit configuration” even when the final list is not specified.

Runtime injection of choices for argparse

To support argparse choices and tab-completion, SchemaCommandLine dynamically injects the computed choices into the injected print_attribute variable only at parser construction time.

print-attribute: exclude

In dict form, print-attribute may define an exclude list:

print_attribute:
  exclude: ["debug", "internal_state"]

Exclude is applied only in true auto mode, meaning:

the effective dict does not contain a choices key (including inherited configs)

If choices is present (even as null), exclude is ignored by design.

Exclude works on external spellings only:

if you exclude "foo", it will remove both the internal and external names that match "foo" from the auto-derived list

Consistency between `choices` and `ext2int`

In auto mode, exclude removes entries from:

the completion/validation choices list
the external→internal mapping (ext2int)

This prevents excluded names from reappearing via alias mappings.

print-attribute: omit_key

In dict form, print-attribute may define an omit_key flag:

print_attribute:
  omit_key: true

If omit_key is true, SchemaCommandLine prints the raw attribute value (not as name: value mapping).

If omit_key is false (default), output is rendered using render_mapping({external_name: value}, ...).

print-attribute: output formatting controls

SchemaCommandLine also supports optional output-format controls for print-attribute:

precision_from
array_style_from

These are optional keys inside the print_attribute config dict. Each key names a schema variable. If the user supplies that variable, SchemaCommandLine uses it to control how print-attribute output is rendered.

`precision_from`

print_attribute:
  precision_from: precision

variables:
  precision:
    type: int
    optional: true
    help: Output precision for print-attribute

Behavior:

If the user supplies precision, SchemaCommandLine validates it as an integer and updates the instance precision.
If the user does not supply it, SchemaCommandLine keeps the constructor precision (default 8).

`array_style_from`

print_attribute:
  array_style_from: array_style

variables:
  array_style:
    type: string
    optional: true
    choices: ["clean", "bare"]
    help: Array style for print-attribute output

Behavior:

If the user supplies array_style, SchemaCommandLine passes it to render_mapping(..., array_style=...) for print-attribute output.
If the user does not supply it, SchemaCommandLine uses the default array-style behavior of render_mapping.

print-attribute and inheritance: practical mental model

Because schema-level print_attribute is inherited and merged, treat it like a small “policy” object:

A base schema can enable print-attribute broadly.
A derived schema can:
- add code-alias mappings
- extend choices
- add exclusions (in auto mode)
- add output-format controls (precision_from, array_style_from)
- or disable print-attribute entirely (false)

The resolver and normalizer together ensure that:

the CLI only ever exposes external spellings
printing uses render_mapping({external_name: value}, ...)
and mapping is deterministic.

How print-attribute output looks

When --print-attribute is used, SchemaCommandLine prints each requested attribute in YAML style:

energy: -12.345
volume: 60.0

Internally, it prints one mapping per attribute, so output remains stream-friendly.

If the external name cannot be resolved to an attribute on the object, a DSUserError is raised with details (including internal name).

Tab completion with argcomplete

If argcomplete is installed, SchemaCommandLine enables completion via:

argcomplete.autocomplete(parser)

For --print-attribute, completion candidates are the computed external choices.

The canonical helper for computing them is:

compute_print_attribute_choices(...)

Downstream tools (like PM) should call this helper rather than duplicating logic.

Summary

SchemaCommandLine provides a practical, batteries-included way to build CLIs from schemas:

schema → argparse options (+ help/aliases/choices)
strict parsing + helpful suggestions
merge YAML + required_file + CLI with clear precedence
validate/type using SchemaProjector
optionally instantiate an object
optionally print introspected attributes using a carefully designed external/internal mapping system

The print-attribute feature is intentionally rich because it lives at the boundary between schema naming, object naming, and user-visible CLI affordances.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified February 8, 2026: updated SCL docs with precision_from and array_style_from and fixed some errors. (f68279e)