SchemaCommandLine
SchemaCommandLine is DataSchemer’s schema-driven command-line engine.
It combines:
- argparse parsing (including optional argcomplete tab-completion)
- YAML input files (optional)
- an optional required positional file (per-command metadata)
- SchemaProjector validation and typing
- optional object instantiation (
target_class) - and an optional print-attribute facility for introspecting results
Where SchemaProjector is the typing/validation core, SchemaCommandLine is the CLI orchestration layer.
High-level flow
When you construct a SchemaCommandLine, it performs all work immediately:
- Resolve schema form/name (
normalize_schema_inputs) - Resolve
print_attributeconfiguration (with inheritance) - Build schema variables (including inheritance/copy/update/delete)
- Build an argparse parser from schema variables
- Parse CLI args (strictly;
allow_abbrev=False) - Optionally load YAML input files
- Optionally read a required positional file (and optionally parse it as YAML)
- Merge all data sources into a single raw input mapping
- Validate and type the data via
SchemaProjector - Optionally instantiate
target_class - Run
_operations()(override point for custom logic) - Optionally print attributes requested via
--print-attribute
This “do everything in __init__” approach keeps the class simple to embed:
construct it once and you either get a successful run or a structured user error.
Basic example
The typical usage pattern is to subclass and set a few class attributes:
from data_schemer.schema_command_line import SchemaCommandLine
class MyCLI(SchemaCommandLine):
pass
schema = {
"variables": {
"a": {"type": "int"},
"b": {"type": "float"},
}
}
# Equivalent of: MyCLI(schema_definitions=schema, argv=[...])
MyCLI(schema, argv=["--a", "1", "--b", "2.5"])
In practice, projects usually supply schema_definitions loaded from YAML, and set
a target_class so a typed object is available in _operations().
Entry point: main()
SchemaCommandLine.main(..., debug: bool = False) -> int
main() is a convenience wrapper that:
- returns a process exit code (
0on success) - catches
DSUserErrorand prints a friendly message to stderr - suppresses tracebacks unless
debug=True
This is the recommended entry point for console_scripts:
if __name__ == "__main__":
raise SystemExit(MyCLI.main(schema, debug="--debug" in sys.argv))
Schema-driven argparse
SchemaCommandLine builds an argparse parser from the resolved schema variables using:
schema_to_argparse(schema_vars, description=..., masquerade_usage=True)
Key behaviors
- Strict parsing:
allow_abbrev=Falseto avoid ambiguous/accidental option matches. - No argparse-required: argparse does not enforce required arguments; the projector does.
- Schema is the contract: help, metavar, aliases, choices, bool flags are schema-driven.
- Improved UX: required variables get a
(required)marker in colored help (TTY only).
Variable → CLI option mapping
A schema variable named foo_bar becomes:
--foo-bar
Aliases may be provided via alias:
- if an alias starts with
-, it is used as-is (short option) - otherwise it becomes
--alias-name
Example:
variables:
iterations:
type: int
alias: ["-n", "num_iterations"]
This yields options:
-n--num-iterations--iterations
Data sources and precedence
SchemaCommandLine can merge values from up to three sources:
- YAML input files (optional, via an “input files” variable in the schema)
- required_file (optional positional argument configured by schema metadata)
- CLI options (always available)
Precedence is:
- YAML input files (lowest)
- required_file schema data (middle)
- CLI options (highest)
This is implemented in _merge_sources():
raw_data = dict(self._input_file_data)
raw_data.update(self._required_file_schema_data)
raw_data.update(self._cli_data)
So a user can keep defaults in YAML and override with CLI flags.
YAML input files
YAML input files are enabled by defining a schema variable (default name is input_files_tag,
constructor default is "input_files"):
- variable must exist in schema vars
- value is expected to be a list of file paths (often via
string-list) - each YAML file must parse to a mapping/dict
- merged sequentially (later files override earlier)
stdin can be used by passing "-" as a YAML input file path (meaning read YAML from stdin).
Important restriction: you cannot use "-" for both required_file and YAML input files in the same invocation.
required_file metadata
A command may declare a required positional file via schema metadata:
required_file:
enabled: true
metavar: FILE
help: Required input file (use '-' for stdin).
apply_as_schema_data: false
read_mode: path # path|text|binary
stdin_ok: true
Behavior
- If
enabled: true, argparse adds a positional argumentrequired_file. - How the file is consumed depends on
read_mode:path: do not read content (only store the path);required_file_contentisNonetext: read text content intorequired_file_contentbinary: read bytes intorequired_file_content
apply_as_schema_data
If apply_as_schema_data: true, the required file is always read as text and parsed as YAML.
The resulting mapping is merged into schema input data at the “required_file” precedence level.
This is useful for commands where the primary input is a YAML block but you want it
as a required positional rather than --input-file.
Accessors
required_file_pathrequired_file_content
Unknown options and suggestions
SchemaCommandLine parses with parse_known_args() to detect unknown options and provide better errors.
- Unknown options beginning with
-raiseDSUnknownNameErrorwith “did you mean” suggestions. - Unexpected positional arguments raise
DSUserError.
Suggestions are computed using close matches over normalized option names (including aliases).
This is a major UX improvement over default argparse errors.
target_class and object instantiation
If target_class is provided, SchemaCommandLine will:
- validate/type inputs using
SchemaProjector - call
SchemaProjector.get_instance(target_class) - expose the instantiated object via
target_obj
Constructor kwargs are filtered by the target class __init__ signature (unknown kwargs ignored).
This makes schemas usable both for:
- configuration validation
- object construction
Extending behavior: _operations()
Override _operations() in subclasses to implement custom logic.
At the time _operations() runs:
self.datais available (typed dict)self.target_objmay be available (iftarget_classprovided)- required_file state is available
A common pattern is:
- compute derived quantities
- write output files
- call domain-specific libraries
print-attribute
print_attribute is a special schema feature that enables a built-in CLI option:
--print-attribute ...
This option lets users request printing one or more @property attributes from the instantiated target object.
It is designed for:
- interactive exploration
- debugging / inspection
- reproducible output (printed in YAML-style using
render_mapping)
Requirements
- print-attribute requires a
target_class - print-attribute is enabled and configured by schema descriptor configuration (not per-variable)
When enabled, SchemaCommandLine injects a reserved variable into the schema:
- variable name:
print_attribute(reserved; users must not define it) - type:
string-list - optional:
true
This injection ensures argparse sees a --print-attribute option without requiring you
to bake it into every schema by hand.
Where configuration comes from
SchemaCommandLine resolves print-attribute configuration via:
print_attribute = get_schema_print_attribute(schema_name, schema_definitions)
This supports inheritance and merging rules defined by the schema resolver.
The resolved value may be:
False/None: disabledTrue: enabled, auto mode- a
dict: enabled, with advanced configuration
Print-attribute modes
After resolution, SchemaCommandLine normalizes the configuration into one of two modes:
Auto mode
Auto mode derives choices by introspecting the target class:
- all public
@propertynames (including inherited) - excluding private names starting with
_
Those internal property names are then mapped to external names presented on the CLI.
Manual mode
Manual mode uses an explicit list of external choices provided by schema configuration:
- no class introspection is used to produce the list
- values are still mapped to internal attribute names before reading from the object
Manual mode is selected when the resolved configuration dict contains a choices key and it is not null.
print-attribute: external vs internal names
This is the most important conceptual point:
- The CLI accepts and displays external names only.
- The underlying object is accessed using internal attribute names.
DataSchemer supports two mapping systems that may both apply:
- Schema variable
code_alias(general DataSchemer feature) - print_attribute.alias (specific to print-attribute)
Both contribute to mapping external → internal attribute names for print-attribute.
Schema code_alias interaction
Schema variable definitions may rename how constructor arguments map to the object:
variables:
a:
type: int
code_alias: x
For print-attribute, SchemaCommandLine builds alias maps from schema vars:
- internal → external
- external → internal
and merges them with print-attribute’s explicit alias mapping (below).
print_attribute.alias
In dict form, print-attribute may define:
print_attribute:
alias:
external_name: internal_name
This affects only print-attribute resolution (not object construction).
Disjointness requirement
These two alias maps must be disjoint (no overlapping keys), otherwise it becomes ambiguous.
The implementation validates disjointness using merge_disjoint_maps(...)
and raises DSUserError if they overlap.
print-attribute: choices
In dict form, print-attribute can constrain and control the allowed --print-attribute values.
Manual choices
print_attribute:
choices: ["energy", "volume"]
This:
- forces manual mode
- restricts CLI candidates strictly to those external spellings
- drives tab-completion candidates (when argcomplete is installed)
If choices: null, then:
- the configuration remains enabled
- but it stays in auto mode
- and (crucially) exclude is suppressed because the
choiceskey is present (see exclude semantics below)
This “choices key present” rule is intentional: it lets inheritance signal “explicit configuration” even when the final list is not specified.
Runtime injection of choices for argparse
To support argparse choices and tab-completion, SchemaCommandLine dynamically injects the computed choices into the injected print_attribute variable only at parser construction time.
print-attribute: exclude
In dict form, print-attribute may define an exclude list:
print_attribute:
exclude: ["debug", "internal_state"]
Exclude is applied only in true auto mode, meaning:
- the effective dict does not contain a
choiceskey (including inherited configs)
If choices is present (even as null), exclude is ignored by design.
Exclude works on external spellings only:
- if you exclude
"foo", it will remove both the internal and external names that match"foo"from the auto-derived list
Consistency between choices and ext2int
In auto mode, exclude removes entries from:
- the completion/validation choices list
- the external→internal mapping (
ext2int)
This prevents excluded names from reappearing via alias mappings.
print-attribute and inheritance: practical mental model
Because schema-level print_attribute is inherited and merged, treat it like a small “policy” object:
- A base schema can enable print-attribute broadly.
- A derived schema can:
- add aliases
- extend choices
- add exclusions (in auto mode)
- or disable print-attribute entirely (
false)
The resolver and normalizer together ensure that:
- the CLI only ever exposes external spellings
- printing uses
render_mapping({external_name: value}) - and mapping is deterministic.
How print-attribute output looks
When --print-attribute is used, SchemaCommandLine prints each requested attribute in YAML style:
energy: -12.345
volume: 60.0
Internally, it prints one mapping per attribute, so output remains stream-friendly.
If the external name cannot be resolved to an attribute on the object,
a DSUserError is raised with details (including internal name).
Tab completion with argcomplete
If argcomplete is installed, SchemaCommandLine enables completion via:
argcomplete.autocomplete(parser)
For --print-attribute, completion candidates are the computed external choices.
The canonical helper for computing them is:
compute_print_attribute_choices(...)
Downstream tools (like PM) should call this helper rather than duplicating logic.
Summary
SchemaCommandLine provides a practical, batteries-included way to build CLIs from schemas:
- schema → argparse options (+ help/aliases/choices)
- strict parsing + helpful suggestions
- merge YAML + required_file + CLI with clear precedence
- validate/type using
SchemaProjector - optionally instantiate an object
- optionally print introspected attributes using a carefully designed external/internal mapping system
The print-attribute feature is intentionally rich because it lives at the boundary between schema naming, object naming, and user-visible CLI affordances.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.