SchemaCommandLine
SchemaCommandLine is DataSchemer’s schema-driven command-line engine.
It combines:
- argparse parsing (including optional argcomplete tab-completion)
- YAML input files (optional)
- an optional required positional file (per-command metadata)
- SchemaProjector validation and typing
- optional object instantiation (
target_class) - and an optional print-attribute facility for introspecting results
Where SchemaProjector is the typing/validation core, SchemaCommandLine is the CLI orchestration layer.
High-level flow
When you construct a SchemaCommandLine, it performs all work immediately:
- Resolve schema form/name (
normalize_schema_inputs) - Resolve
print_attributeconfiguration (with inheritance) - Build schema variables (including inheritance/copy/update/delete)
- Build an argparse parser from schema variables
- Parse CLI args (strictly;
allow_abbrev=False) - Optionally load YAML input files
- Optionally read a required positional file (and optionally parse it as YAML)
- Merge all data sources into a single raw input mapping
- Validate and type the data via
SchemaProjector - Optionally instantiate
target_class - Run
_operations()(override point for custom logic) - Optionally print attributes requested via
--print-attribute
This “do everything in __init__” approach keeps the class simple to embed:
construct it once and you either get a successful run or a structured user error.
Basic example
The typical usage pattern is to subclass and set a few class attributes:
from data_schemer.schema_command_line import SchemaCommandLine
class MyCLI(SchemaCommandLine):
pass
schema = {
"variables": {
"a": {"type": "int"},
"b": {"type": "float"},
}
}
# Equivalent of: MyCLI(schema_definitions=schema, argv=[...])
MyCLI(schema, argv=["--a", "1", "--b", "2.5"])
In practice, projects usually supply schema_definitions loaded from YAML, and set
a target_class so a typed object is available in _operations().
Entry point: main()
SchemaCommandLine.main(..., debug: bool = False) -> int
main() is a convenience wrapper that:
- returns a process exit code (
0on success) - catches
DSUserErrorand prints a friendly message to stderr - suppresses tracebacks unless
debug=True
This is the recommended entry point for console_scripts:
if __name__ == "__main__":
raise SystemExit(MyCLI.main(schema, debug="--debug" in sys.argv))
Schema-driven argparse
SchemaCommandLine builds an argparse parser from the resolved schema variables using:
schema_to_argparse(schema_vars, description=..., masquerade_usage=True)
Key behaviors
- Strict parsing:
allow_abbrev=Falseto avoid ambiguous/accidental option matches. - No argparse-required: argparse does not enforce required arguments; the projector does.
- Schema is the contract: help, metavar, aliases, choices, bool flags are schema-driven.
- Improved UX: required variables get a
(required)marker in colored help (TTY only).
Variable → CLI option mapping
A schema variable named foo_bar becomes:
--foo-bar
Aliases may be provided via alias:
- if an alias starts with
-, it is used as-is (short option) - otherwise it becomes
--alias-name
Example:
variables:
iterations:
type: int
alias: ["-n", "num_iterations"]
This yields options:
-n--num-iterations--iterations
Child-schemas
Child-schemas let a schema define a nested command grammar using bare tokens (not flags), with schema-driven options available at each level. The child-schema feature is designed to be:
- deterministic (schema-declared ordering; no argparse subparsers)
- strict (unknown tokens/options are errors)
- nestable (grandchild-schemas and deeper)
- completion-friendly (context-aware token/option completion when argcomplete is installed)
This section documents the current implementation behavior.
Declaring child-schemas
A schema may declare a list of child-schemas using child_schemas:
root:
variables:
# root options...
child_schemas: [post_process, export]
post_process:
description: Post-processing controls
variables:
level:
type: int
optional: true
export:
description: Export controls
variables:
format:
type: string
optional: true
choices: [json, yaml]
Notes:
child_schemasmay be a string or a list of strings.- Child-schemas accumulate through inheritance (bases first, then child), with duplicates removed while preserving first occurrence order.
- Child-schemas may themselves declare
child_schemas, allowing arbitrary nesting depth. - Child-schemas must not define
required_file(enforced by SchemaProjector schema validation).
CLI shape: bare tokens + options
Child-schemas appear on the CLI as bare tokens in kebab-case:
prog post-process --level 2
prog export --format yaml
Tokens are schema names rendered as:
- schema name (snake_case):
post_process - CLI token (kebab-case):
post-process
Nested child-schemas (grandchild-schemas)
Example nesting:
root:
child_schemas: [post_process]
post_process:
child_schemas: [advanced]
variables:
level:
type: int
optional: true
advanced:
child_schemas: [knobs]
variables:
adv:
type: int
optional: true
knobs:
variables:
k:
type: int
optional: true
CLI:
prog post-process --level 1 advanced --adv 2 knobs --k 3
child_schema_mode: exclusive vs inclusive
Each schema may set:
child_schema_mode: exclusive # default
# or
child_schema_mode: inclusive
Meaning:
- exclusive (default): once you descend into a child schema, you may only enter its descendants next. You cannot jump to a sibling token of the parent while inside the child.
- inclusive: enables a one-level sibling jump at that schema level: while inside a child, you may jump to another child of the parent (the parent must be inclusive).
This is intentionally limited to one level; the parser does not walk up multiple ancestors to search for siblings.
Error messages for invalid tokens
When a user types a token that exists somewhere in the schema tree but is invalid in the current context,
SchemaCommandLine raises a DSUserError that is explicit about categories:
- what schema you were configuring,
- that the invalid thing is a command token (not an option),
- what you are allowed to do next (set options vs enter child-commands).
Example form:
Invalid command 'knobs' while configuring 'post_process'.
Here you may:
• set options for 'post_process' (e.g. --level)
• enter child-commands of 'post_process': advanced
Help listing for child-schemas
When child-schemas are present, help output includes a dedicated group listing tokens (not flags). The group label is configurable per schema via:
child_schema_label: Commands
Each listed token can also show a short description if the referenced schema provides a description field.
Tab completion (argcomplete)
When argcomplete is installed and active:
- At a given point in the command line, completion suggests either:
- child-schema tokens valid in the current context, or
- options for the current context schema.
- Completion is context-aware and supports arbitrary nesting depth.
Important: child-schemas are implemented without argparse subparsers. Completion uses a completion-only parser plus a grammar-aware context resolver, so you still get clean help output and deterministic behavior.
Data merging with child-schemas
Child-schema data may come from YAML input files and from CLI segments.
When both provide values for the same child-schema block:
- YAML provides the baseline mapping for that block
- CLI overrides keys inside that block
This behavior is implemented by merging nested mappings for each child-schema name before SchemaProjector validation/typing.
Data sources and precedence
SchemaCommandLine can merge values from up to two sources:
- YAML input files (optional, via an “input files” variable in the schema)
- CLI options (always available)
Precedence is:
- YAML input files (lowest)
- CLI options (highest)
This is implemented in _merge_sources():
raw_data = dict(self._input_file_data)
raw_data.update(self._cli_data)
So a user can keep defaults in YAML and override with CLI flags.
YAML input files
YAML input files are enabled by defining a schema variable (default name is input_files_tag,
constructor default is "input_files"):
- variable must exist in schema vars
- value is expected to be a list of file paths (often via
string-list) - each YAML file must parse to a mapping/dict
- merged sequentially (later files override earlier)
stdin can be used by passing "-" as a YAML input file path (meaning read YAML from stdin).
Important restriction: you cannot use "-" for both required_file and YAML input files in the same invocation.
required_file metadata
A command may declare a required positional file via schema metadata:
required_file:
enabled: true
metavar: FILE
help: Required input file (use '-' for stdin).
read_mode: path # path|text|binary
stdin_ok: true
Behavior
- If
enabled: true, argparse adds a positional argumentrequired_file. - How the file is consumed depends on
read_mode:path: do not read content (only store the path);required_file_contentisNonetext: read text content intorequired_file_contentbinary: read bytes intorequired_file_content
apply_as_schema_data
If apply_as_schema_data: true, the required file is always read as text and parsed as YAML.
The resulting mapping is merged into schema input data at the “required_file” precedence level.
This is useful for commands where the primary input is a YAML block but you want it
as a required positional rather than --input-file.
Accessors
required_file_pathrequired_file_content
Unknown options and suggestions
SchemaCommandLine parses with parse_known_args() to detect unknown options and provide better errors.
- Unknown options beginning with
-raiseDSUnknownNameErrorwith “did you mean” suggestions. - Unexpected positional arguments raise
DSUserError.
Suggestions are computed using close matches over normalized option names (including aliases).
This is a major UX improvement over default argparse errors.
target_class and object instantiation
If target_class is provided, SchemaCommandLine will:
- validate/type inputs using
SchemaProjector - call
SchemaProjector.get_instance(target_class) - expose the instantiated object via
target_obj
Constructor kwargs are filtered by the target class __init__ signature (unknown kwargs ignored).
This makes schemas usable both for:
- configuration validation
- object construction
Extending behavior: _operations()
Override _operations() in subclasses to implement custom logic.
At the time _operations() runs:
self.datais available (typed dict)self.target_objmay be available (iftarget_classprovided)- required_file state is available
A common pattern is:
- compute derived quantities
- write output files
- call domain-specific libraries
print-attribute
print_attribute is a special schema feature that enables a built-in CLI option:
--print-attribute ...
This option lets users request printing one or more @property attributes from the instantiated target object.
It is designed for:
- interactive exploration
- debugging / inspection
- reproducible output (printed in YAML-style using
render_mapping)
Requirements
- print-attribute requires a
target_class - print-attribute is enabled and configured by schema descriptor configuration (not per-variable)
When enabled, SchemaCommandLine injects a reserved variable into the schema:
- variable name:
print_attribute(reserved; users must not define it) - type:
string-list - optional:
true
This injection ensures argparse sees a --print-attribute option without requiring you
to bake it into every schema by hand.
Where configuration comes from
SchemaCommandLine resolves print-attribute configuration via:
print_attribute = get_schema_print_attribute(schema_name, schema_definitions)
This supports inheritance and merging rules defined by the schema resolver.
The resolved value may be:
False/None: disabledTrue: enabled, auto mode- a
dict: enabled, with advanced configuration
Print-attribute modes
After resolution, SchemaCommandLine normalizes the configuration into one of two modes:
Auto mode
Auto mode derives choices by introspecting the target class:
- all public
@propertynames (including inherited) - excluding private names starting with
_
Those internal property names are then mapped to external names presented on the CLI.
Manual mode
Manual mode uses an explicit list of external choices provided by schema configuration:
- no class introspection is used to produce the list
- values are still mapped to internal attribute names before reading from the object
Manual mode is selected when the resolved configuration dict contains a choices key and it is not null.
print-attribute: external vs internal names
This is the most important conceptual point:
- The CLI accepts and displays external names only.
- The underlying object is accessed using internal attribute names.
DataSchemer supports two mapping systems that may both apply:
- Schema variable
code_alias(general DataSchemer feature) - print_attribute.code_alias (specific to print-attribute)
Both contribute to mapping external → internal attribute names for print-attribute.
Schema code_alias interaction
Schema variable definitions may rename how constructor arguments map to the object:
variables:
a:
type: int
code_alias: x
For print-attribute, SchemaCommandLine builds alias maps from schema vars:
- internal → external
- external → internal
and merges them with print-attribute’s explicit code-alias mapping (below).
print_attribute.code_alias
In dict form, print-attribute may define:
print_attribute:
code_alias:
external_name: internal_name
This affects only print-attribute resolution (not object construction).
Disjointness requirement
These two alias maps must be disjoint (no overlapping keys), otherwise it becomes ambiguous.
The implementation validates disjointness using merge_disjoint_maps(...)
and raises DSUserError if they overlap.
print-attribute: choices
In dict form, print-attribute can constrain and control the allowed --print-attribute values.
Manual choices
print_attribute:
choices: ["energy", "volume"]
This:
- forces manual mode
- restricts CLI candidates strictly to those external spellings
- drives tab-completion candidates (when argcomplete is installed)
If choices: null, then:
- the configuration remains enabled
- but it stays in auto mode
- and (crucially) exclude is suppressed because the
choiceskey is present (see exclude semantics below)
This “choices key present” rule is intentional: it lets inheritance signal “explicit configuration” even when the final list is not specified.
Runtime injection of choices for argparse
To support argparse choices and tab-completion, SchemaCommandLine dynamically injects the computed choices into the injected print_attribute variable only at parser construction time.
print-attribute: exclude
In dict form, print-attribute may define an exclude list:
print_attribute:
exclude: ["debug", "internal_state"]
Exclude is applied only in true auto mode, meaning:
- the effective dict does not contain a
choiceskey (including inherited configs)
If choices is present (even as null), exclude is ignored by design.
Exclude works on external spellings only:
- if you exclude
"foo", it will remove both the internal and external names that match"foo"from the auto-derived list
Consistency between choices and ext2int
In auto mode, exclude removes entries from:
- the completion/validation choices list
- the external→internal mapping (
ext2int)
This prevents excluded names from reappearing via alias mappings.
print-attribute: omit_key
In dict form, print-attribute may define an omit_key flag:
print_attribute:
omit_key: true
If omit_key is true, SchemaCommandLine prints the raw attribute value (not as name: value mapping).
If omit_key is false (default), output is rendered using render_mapping({external_name: value}, ...).
print-attribute: output formatting controls
SchemaCommandLine also supports optional output-format controls for print-attribute:
precision_fromarray_style_from
These are optional keys inside the print_attribute config dict. Each key names a schema variable.
If the user supplies that variable, SchemaCommandLine uses it to control how print-attribute output is rendered.
precision_from
print_attribute:
precision_from: precision
variables:
precision:
type: int
optional: true
help: Output precision for print-attribute
Behavior:
- If the user supplies
precision, SchemaCommandLine validates it as an integer and updates the instance precision. - If the user does not supply it, SchemaCommandLine keeps the constructor precision (default
8).
array_style_from
print_attribute:
array_style_from: array_style
variables:
array_style:
type: string
optional: true
choices: ["clean", "bare"]
help: Array style for print-attribute output
Behavior:
- If the user supplies
array_style, SchemaCommandLine passes it torender_mapping(..., array_style=...)for print-attribute output. - If the user does not supply it, SchemaCommandLine uses the default array-style behavior of
render_mapping.
print-attribute and inheritance: practical mental model
Because schema-level print_attribute is inherited and merged, treat it like a small “policy” object:
- A base schema can enable print-attribute broadly.
- A derived schema can:
- add code-alias mappings
- extend choices
- add exclusions (in auto mode)
- add output-format controls (
precision_from,array_style_from) - or disable print-attribute entirely (
false)
The resolver and normalizer together ensure that:
- the CLI only ever exposes external spellings
- printing uses
render_mapping({external_name: value}, ...) - and mapping is deterministic.
How print-attribute output looks
When --print-attribute is used, SchemaCommandLine prints each requested attribute in YAML style:
energy: -12.345
volume: 60.0
Internally, it prints one mapping per attribute, so output remains stream-friendly.
If the external name cannot be resolved to an attribute on the object,
a DSUserError is raised with details (including internal name).
Tab completion with argcomplete
If argcomplete is installed, SchemaCommandLine enables completion via:
argcomplete.autocomplete(parser)
For --print-attribute, completion candidates are the computed external choices.
The canonical helper for computing them is:
compute_print_attribute_choices(...)
Downstream tools (like PM) should call this helper rather than duplicating logic.
Summary
SchemaCommandLine provides a practical, batteries-included way to build CLIs from schemas:
- schema → argparse options (+ help/aliases/choices)
- strict parsing + helpful suggestions
- merge YAML + required_file + CLI with clear precedence
- validate/type using
SchemaProjector - optionally instantiate an object
- optionally print introspected attributes using a carefully designed external/internal mapping system
The print-attribute feature is intentionally rich because it lives at the boundary between schema naming, object naming, and user-visible CLI affordances.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.