Practical, copy-paste usage examples for DataSchemer.
1 - Coercing numbers
DataSchemer is designed to accept human-friendly numeric input and reliably convert it into structured Python data. This is especially important for command-line tools, configuration files, and lightweight data formats, where users naturally write numbers as text, expressions, or simple tables.
The numeric coercion utilities live in coerce_numbers.py and can be used directly, or indirectly through schema-driven workflows.
The core idea
The coercion utilities take loosely structured numeric text and turn it into:
- Python scalars (
int,float,Fraction, …) - Python lists of numbers
- Nested lists representing matrices
The input may contain:
- arithmetic expressions
- symbolic substitutions
- multiple values separated by whitespace or punctuation
- simple matrix layouts
The goal is to let users write what they mean, without forcing rigid syntax or file formats.
Import
The most commonly used entry point is:
from data_schemer.coerce_numbers import coerce_number_list_with_substitutions
This function always returns Python lists, never NumPy arrays.
Results can be trivially converted to NumPy arrays, or one may import
coerce_array_with_substitutions, which returns a NumPy array.
Scalars: numbers as expressions
At the simplest level, a single number—written as text—is parsed and evaluated:
coerce_number_list_with_substitutions("3", dtype=int)
# -> [3]
Expressions are allowed, so users don’t need to precompute values:
coerce_number_list_with_substitutions("1/2", dtype=float)
# -> [0.5]
coerce_number_list_with_substitutions("3*4 + 1", dtype=int)
# -> [13]
Supported arithmetic includes:
- addition and subtraction:
+,- - multiplication and division:
*,/ - scientific notation:
1e-3,2E+4 - square roots using
rNnotation:
coerce_number_list_with_substitutions("r2", dtype=float)
# -> [1.4142135623730951]
This is intentionally small but predictable: enough power for scientific input, without becoming a general-purpose programming language.
Multiple values: lists
Most real inputs contain more than one number. Values can be separated by:
- spaces
- commas
- semicolons
- newlines
All of the following are equivalent:
coerce_number_list_with_substitutions("1 2 3", dtype=int)
# -> [1, 2, 3]
coerce_number_list_with_substitutions("1,2,3", dtype=int)
# -> [1, 2, 3]
coerce_number_list_with_substitutions("1; 2; 3", dtype=int)
# -> [1, 2, 3]
This makes the parser forgiving and easy to use in CLI contexts.
Matrices: nested lists
When separators imply rows (such as newlines or semicolons), the result becomes a nested list:
coerce_number_list_with_substitutions("1,0;0,1", dtype=int)
# -> [[1, 0], [0, 1]]
Newlines work naturally:
text = """
1 0 0
0 1 0
0 0 1
"""
coerce_number_list_with_substitutions(text, dtype=int)
# -> [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
At this stage, the structure is purely Python lists—no NumPy assumptions are made.
Header substitutions: symbolic values
One of the most powerful features is header substitutions.
You can define symbols at the top of the input and reuse them below:
text = """
a=0.5 b=1/3
a, b
0.0, 2*a
"""
coerce_number_list_with_substitutions(text, dtype=float)
# -> [[0.5, 0.3333333333333333], [0.0, 1.0]]
How this works:
- The first line defines substitutions (
a,b) - These symbols are available in all subsequent expressions
- Expressions are evaluated after substitution
This is especially useful for:
- lattice vectors
- parameterized matrices
- avoiding repeated numeric constants
Exact arithmetic with Fraction
By default, numeric expressions are evaluated using floating-point arithmetic.
If you need exact rational values, you can request Fraction explicitly:
from fractions import Fraction
coerce_number_list_with_substitutions("1/3", dtype=Fraction)
# -> [Fraction(1, 3)]
This applies consistently to expressions and substitutions:
text = """
a=1/3
2*a
"""
coerce_number_list_with_substitutions(text, dtype=Fraction)
# -> [Fraction(2, 3)]
This is particularly useful in symbolic or group-theoretical contexts where exact ratios matter.
What this function guarantees
- Output is always a Python list (possibly nested)
- All numeric values are of the requested
dtype - Expressions are evaluated deterministically
- Substitutions are scoped to the input block
- No NumPy dependency is introduced at this level
When to use numeric coercion directly
Use these utilities directly when you are:
- parsing numeric text files
- accepting numeric expressions from users
- handling CLI arguments that represent vectors or matrices
- preprocessing data before schema validation
In schema-driven workflows, you typically won’t call this yourself.
SchemaProjector invokes numeric coercion automatically when a schema variable is declared as numeric.
Design philosophy
The numeric coercion layer is intentionally:
- permissive in input syntax
- strict in output structure
- predictable and reproducible
- free of side effects
It acts as a bridge between human-readable numeric text and strongly-typed Python data—without imposing unnecessary ceremony.
2 - Rendering data
DataSchemer includes small, focused utilities for rendering Python data into readable, YAML-compatible text. These tools are intended for human-facing output: command-line tools, reports, and snippets that users may want to copy directly into configuration files.
The rendering utilities live in render_data.py.
They are not a full serialization system. Instead, they provide:
- stable, predictable formatting
- readable alignment for numeric data
- minimal YAML-compatible syntax
- no implicit I/O or file handling
Design goals
The rendering layer is intentionally conservative.
It aims to:
- produce output that is easy to read in a terminal
- remain valid YAML when pasted into a file
- avoid introducing YAML features that obscure structure
- keep formatting decisions explicit and reproducible
For full serialization, schema validation, or round-tripping, standard YAML libraries should be used instead.
Overview of the API
The public rendering API consists of three main functions:
render_variablerender_arrayrender_mapping
These functions are designed to compose cleanly:
render_variable dispatches to render_array or render_mapping as needed.
render_variable
render_variable(name, value, *, indent=0)
Render a single named variable and its value as YAML-style text.
This is the primary entry point used by DataSchemer command-line tools when emitting results.
Behavior
- Prepends the variable name followed by
: - Delegates rendering of the value based on its type
- Applies indentation consistently
- Produces YAML-compatible output
Example
from data_schemer.render_data import render_variable
render_variable("temperature", 300)
Output:
temperature: 300
Arrays and mappings are rendered on subsequent lines with indentation:
import numpy as np
render_variable("vector", np.array([1, 2, 3]))
vector:
[ 1, 2, 3 ]
render_array
render_array(array, *, indent=0)
Render a one- or two-dimensional array as aligned, readable text.
This function accepts:
- NumPy arrays
- nested Python lists
- array-like objects convertible to NumPy
Formatting rules
- Arrays are rendered using bracket notation (
[ ... ]) - Rows are aligned vertically for readability
- Floating-point values are trimmed of insignificant zeros
- Negative zero is suppressed (
-0.0 → 0) - Output is valid YAML (inline sequences)
Example: vector
import numpy as np
from data_schemer.render_data import render_array
render_array(np.array([0.5, 0.0, 1.0]))
[ 0.5, 0, 1 ]
Example: matrix
matrix = np.array([
[0.5, 0.5, 0.5],
[0.0, 0.0, 0.0],
[0.5, 0.0, 0.0],
])
render_array(matrix, indent=2)
[[ 0.5, 0.5, 0.5 ],
[ 0 , 0 , 0 ],
[ 0.5, 0 , 0 ]]
The formatting prioritizes visual alignment over compactness.
render_mapping
render_mapping(mapping, *, indent=0)
Render a mapping (typically a dictionary) as a YAML-style block.
Behavior
- Keys are rendered in insertion order
- Each key is rendered using
render_variable - Nested mappings increase indentation
- Values may be scalars, arrays, or other mappings
Example
from data_schemer.render_data import render_mapping
import numpy as np
data = {
"positions": np.array([
[0.5, 0.5, 0.5],
[0.0, 0.0, 0.0],
]),
"lattice_constant": 3.905,
}
print(render_mapping(data))
Output:
positions:
[[ 0.5, 0.5, 0.5 ],
[ 0 , 0 , 0 ]]
lattice_constant: 3.905
YAML compatibility
The output produced by the rendering utilities is:
- syntactically valid YAML
- intentionally minimal
- free of tags, anchors, or advanced YAML constructs
This makes it suitable for:
- copy-paste into configuration files
- inclusion in documentation
- inspection in terminal output
However, the rendering layer does not guarantee round-tripping back to the original Python object.
When to use render_data
Use the rendering utilities when you want:
- human-readable numeric output
- stable formatting for CLI tools
- YAML-compatible snippets without full serialization
- consistent array formatting across tools
Do not use them when you need:
- schema enforcement
- automatic file writing
- full YAML feature support
- guaranteed round-trip fidelity
Relationship to SchemaCommandLine
In schema-driven workflows, these functions are typically invoked indirectly.
SchemaCommandLine uses render_variable to emit results in a consistent, user-facing format, ensuring that command output is both readable and reusable.
Summary
The rendering layer is intentionally small:
render_variablehandles named outputrender_arrayhandles numeric structurerender_mappinghandles hierarchical data
Together, they provide a predictable bridge between Python data structures and human-readable, YAML-compatible text.
3 - SchemaProjector
SchemaProjector is the core validation and typing engine of DataSchemer.
It takes:
- a schema definition (a dictionary)
- raw input data (a dictionary, often containing strings or loosely typed values)
and produces validated, typed Python data.
Validation happens during construction: if anything is missing, unknown, or invalid, SchemaProjector(...) raises a DataSchemer user-facing exception.
This page documents the behavior that exists in the current implementation.
Basic example
from data_schemer.schema_projector import SchemaProjector
schema = {
"variables": {
"a": {"type": "int", "optional": False},
"b": {"type": "float", "optional": False},
}
}
raw = {"a": "1", "b": "2.5"}
p = SchemaProjector(schema, raw)
assert p.data["a"] == 1
assert p.data["b"] == 2.5
Validation model
Required vs optional
In DataSchemer, variables are required by default.
- Required variables must be present in the input.
- Optional variables are explicitly marked with
optional: true. - A
defaultmay only be specified for optional variables. (If a required variable defines adefault, schema loading raises an error.)
Defaults are applied before validation:
- If an optional variable has a
defaultand the user did not provide a value, the default is injected intoraw_data.
Unknown keys are rejected
SchemaProjector is strict: unknown input keys are an error.
If input_data contains a key that is not declared in the schema’s variables, construction fails with DSUnknownNameError (including close-match suggestions).
This strictness is intentional: it catches typos early and keeps schemas authoritative.
Additional schema restrictions
SchemaProjector enforces:
requires: a variable may require one or more other variables to be presentconflicts: sets of variables where at most one may be definedchoices: restrict allowed values (including support forstring-list)
Errors and what to catch
SchemaProjector raises user-facing DataSchemer exceptions from data_schemer.errors, including:
DSMissingRequiredError— required variables were not providedDSUnknownNameError— schema name not found (in multi-schema form) or input contains unknown variable(s)DSInvalidChoiceError— an input value is not in a variable’schoicesDSCoercionError— type coercion failed (includes variable name, raw value, expected type, and details)DSUserError— general user-facing schema errors (e.g., conflicts, requires)
When embedding DataSchemer, it is usually sufficient to catch DSUserError (the base class), unless you want custom handling per subtype.
Schema forms: single vs multi-schema
Single schema (no name)
If the schema dictionary contains a top-level variables key, it is treated as a single schema and given the default name "default" (unless you pass an explicit name).
schema = {
"variables": {
"x": {"type": "int", "optional": False},
}
}
p = SchemaProjector(schema, {"x": "3"})
assert p.schema_name == "default"
Multiple schemas with inheritance
A schema dictionary may contain multiple named schemas. A derived schema may list bases in inherit.
schema = {
"base": {
"variables": {
"a": {"type": "int", "optional": False},
}
},
"derived": {
"inherit": ["base"],
"variables": {
"mode": {
"type": "string",
"optional": False,
"choices": ["fast", "accurate"]
}
}
}
}
p = SchemaProjector(schema, {"a": "2", "mode": "fast"}, "derived")
assert p.data["a"] == 2
assert p.data["mode"] == "fast"
Inheritance semantics (current behavior)
- Base schemas are applied first; child schema variables override base variables of the same name.
- Inheritance is recursive: bases may themselves inherit from other schemas.
- If
schema_definitionscontains multiple schemas,schema_nameis required; omitting it raisesDSUserError. - If
schema_namedoes not exist,DSUnknownNameErroris raised with suggested schema names.
Note: the implementation treats
inheritas an ordered list but does not explicitly define a conflict rule when multiple bases define the same variable name. In practice, the recursive update order determines the winner. If you rely on multiple inheritance, it is worth standardizing and documenting a precedence rule (or disallowing ambiguous overlaps).
Variable definitions
A schema’s variables mapping associates each variable name with a definition dictionary.
Common keys include:
type(required; see below)optional(default:False)default(optional variables only)choices(list of allowed values)requires(list of required companion variables)code_alias(rename when constructing objects; see below)help(Description of variables that may be used in documentation)metavar(String passed to documentation)
Variable copy / update / delete / conflict (advanced)
In multi-schema definitions, the current implementation supports schema-level variable transforms:
copy_variables: define a new variable by copying an existing variable from another schema using"schema@var"syntaxupdate_variables: patch selected fields of existing variable definitionsdelete_variables: remove variables from the final merged setconflicts: sets of variables where at most one may be defined
These features are powerful, but they also introduce complexity. If you expect users to rely on them, add a short dedicated section with one concrete example for each.
Supported types
Built-in types are defined in SchemaProjector._type_definitions. The most commonly used are:
Scalars:
intfloatfraction(fractions.Fraction)bool(acceptsTrue/Falseor"true"/"false", case-insensitive)stringnone(identity)
Structured:
string-list(string →[string], list → list of strings)dict-string-to-float-matrixdict-string-to-float-vector
Numeric arrays:
float-vector,float-matrix(support expressions + substitutions)int-vector,int-matrixfraction-vector,fraction-matrixstring-vector,string-matrix
Vector/matrix shaping rules
Array-like types are coerced into NumPy arrays and normalized to shape:
- vectors: always 1D (
reshape(-1)) - matrices:
- scalar →
(1, 1) - 1D →
(1, N) - 2D → unchanged
- higher-D → flattened into rows (
reshape(-1, last_dim))
- scalar →
Arrays, substitutions, and named tokens
For float-vector and float-matrix, numeric coercion supports expressions and header substitutions (see numeric coercion docs).
In addition, SchemaProjector supports a small set of named array tokens for certain array types:
I→ identity matrix (3×3)
Tokens may be prefixed by an integer multiplier:
schema = {"variables": {"m": {"type": "float-matrix", "optional": False}}}
SchemaProjector(schema, {"m": "I"}).data["m"] # identity
SchemaProjector(schema, {"m": "2I"}).data["m"] # 2 × identity
Token matching is currently:
- only for specific array types (
float-matrix,float-vector,int-matrix,int-vector,fraction-matrix,fraction-vector) - only when the raw value is a string
- based on a leading integer multiplier + token name (e.g.
"12I")
If you plan to add more tokens (e.g. O for zeros, E for ones, diag(...), etc.), list them here as they become public API.
Supplying additional choices at runtime
The constructor accepts an optional variable_choices mapping:
p = SchemaProjector(schema, raw, variable_choices={"mode": ["debug", "safe"]})
This extends (or adds) the schema choices for the specified variables.
If variable_choices references an unknown variable name, DSUnknownNameError is raised (with suggestions).
This is useful when a higher-level tool wants to allow extra modes without editing the schema file.
Accessing results
SchemaProjector provides:
data— typed output (dict)raw_data— raw input after defaults are applied (dict)data_tuple— namedtuple view ofdataschema_variables— resolved variable definitions for the chosen schema (merged with inherited variables)schema_name— resolved schema nameconflicts— merged conflict lists (including inherited conflicts)
String renderings (YAML-style):
raw_data_string—render_mapping(self.raw_data)data_string—render_mapping(self.data)
Aliasing and object construction
A variable may define code_alias, which renames it when instantiating a target class:
class Foo:
def __init__(self, x):
self.x = x
schema = {
"variables": {
"a": {"type": "int", "optional": False, "code_alias": "x"},
}
}
p = SchemaProjector(schema, {"a": "5"})
obj = p.get_instance(Foo)
assert obj.x == 5
How get_instance works (current behavior)
- The projector builds
aliased_data: values are renamed according tocode_aliaswhere present. get_instance(cls)inspectscls.__init__and passes only those keys that match constructor parameter names.- Extra data keys are ignored for instantiation purposes (but unknown input keys are still rejected during projection).
This makes it easy to use the same schema both for validation and for building an object without manually wiring parameter names.
Extending domains
You can extend supported types and array tokens by creating a derived projector class:
Projector2 = SchemaProjector.domain_definitions(
type_definitions={"double": lambda x: int(x) * 2},
array_by_name={"Z": "0 0 0 ; 0 0 0 ; 0 0 0"},
)
schema = {"variables": {"a": {"type": "double", "optional": False}}}
p = Projector2(schema, {"a": "4"})
assert p.data["a"] == 8
domain_definitions(...) returns a subclass whose definitions extend the built-ins.
If a name collides, the new definition overrides the old one.
4 - SchemaCommandLine
SchemaCommandLine is DataSchemer’s schema-driven command-line engine.
It combines:
- argparse parsing (including optional argcomplete tab-completion)
- YAML input files (optional)
- an optional required positional file (per-command metadata)
- SchemaProjector validation and typing
- optional object instantiation (
target_class) - and an optional print-attribute facility for introspecting results
Where SchemaProjector is the typing/validation core, SchemaCommandLine is the CLI orchestration layer.
High-level flow
When you construct a SchemaCommandLine, it performs all work immediately:
- Resolve schema form/name (
normalize_schema_inputs) - Resolve
print_attributeconfiguration (with inheritance) - Build schema variables (including inheritance/copy/update/delete)
- Build an argparse parser from schema variables
- Parse CLI args (strictly;
allow_abbrev=False) - Optionally load YAML input files
- Optionally read a required positional file (and optionally parse it as YAML)
- Merge all data sources into a single raw input mapping
- Validate and type the data via
SchemaProjector - Optionally instantiate
target_class - Run
_operations()(override point for custom logic) - Optionally print attributes requested via
--print-attribute
This “do everything in __init__” approach keeps the class simple to embed:
construct it once and you either get a successful run or a structured user error.
Basic example
The typical usage pattern is to subclass and set a few class attributes:
from data_schemer.schema_command_line import SchemaCommandLine
class MyCLI(SchemaCommandLine):
pass
schema = {
"variables": {
"a": {"type": "int"},
"b": {"type": "float"},
}
}
# Equivalent of: MyCLI(schema_definitions=schema, argv=[...])
MyCLI(schema, argv=["--a", "1", "--b", "2.5"])
In practice, projects usually supply schema_definitions loaded from YAML, and set
a target_class so a typed object is available in _operations().
Entry point: main()
SchemaCommandLine.main(..., debug: bool = False) -> int
main() is a convenience wrapper that:
- returns a process exit code (
0on success) - catches
DSUserErrorand prints a friendly message to stderr - suppresses tracebacks unless
debug=True
This is the recommended entry point for console_scripts:
if __name__ == "__main__":
raise SystemExit(MyCLI.main(schema, debug="--debug" in sys.argv))
Schema-driven argparse
SchemaCommandLine builds an argparse parser from the resolved schema variables using:
schema_to_argparse(schema_vars, description=..., masquerade_usage=True)
Key behaviors
- Strict parsing:
allow_abbrev=Falseto avoid ambiguous/accidental option matches. - No argparse-required: argparse does not enforce required arguments; the projector does.
- Schema is the contract: help, metavar, aliases, choices, bool flags are schema-driven.
- Improved UX: required variables get a
(required)marker in colored help (TTY only).
Variable → CLI option mapping
A schema variable named foo_bar becomes:
--foo-bar
Aliases may be provided via alias:
- if an alias starts with
-, it is used as-is (short option) - otherwise it becomes
--alias-name
Example:
variables:
iterations:
type: int
alias: ["-n", "num_iterations"]
This yields options:
-n--num-iterations--iterations
Data sources and precedence
SchemaCommandLine can merge values from up to three sources:
- YAML input files (optional, via an “input files” variable in the schema)
- required_file (optional positional argument configured by schema metadata)
- CLI options (always available)
Precedence is:
- YAML input files (lowest)
- required_file schema data (middle)
- CLI options (highest)
This is implemented in _merge_sources():
raw_data = dict(self._input_file_data)
raw_data.update(self._required_file_schema_data)
raw_data.update(self._cli_data)
So a user can keep defaults in YAML and override with CLI flags.
YAML input files
YAML input files are enabled by defining a schema variable (default name is input_files_tag,
constructor default is "input_files"):
- variable must exist in schema vars
- value is expected to be a list of file paths (often via
string-list) - each YAML file must parse to a mapping/dict
- merged sequentially (later files override earlier)
stdin can be used by passing "-" as a YAML input file path (meaning read YAML from stdin).
Important restriction: you cannot use "-" for both required_file and YAML input files in the same invocation.
required_file metadata
A command may declare a required positional file via schema metadata:
required_file:
enabled: true
metavar: FILE
help: Required input file (use '-' for stdin).
apply_as_schema_data: false
read_mode: path # path|text|binary
stdin_ok: true
Behavior
- If
enabled: true, argparse adds a positional argumentrequired_file. - How the file is consumed depends on
read_mode:path: do not read content (only store the path);required_file_contentisNonetext: read text content intorequired_file_contentbinary: read bytes intorequired_file_content
apply_as_schema_data
If apply_as_schema_data: true, the required file is always read as text and parsed as YAML.
The resulting mapping is merged into schema input data at the “required_file” precedence level.
This is useful for commands where the primary input is a YAML block but you want it
as a required positional rather than --input-file.
Accessors
required_file_pathrequired_file_content
Unknown options and suggestions
SchemaCommandLine parses with parse_known_args() to detect unknown options and provide better errors.
- Unknown options beginning with
-raiseDSUnknownNameErrorwith “did you mean” suggestions. - Unexpected positional arguments raise
DSUserError.
Suggestions are computed using close matches over normalized option names (including aliases).
This is a major UX improvement over default argparse errors.
target_class and object instantiation
If target_class is provided, SchemaCommandLine will:
- validate/type inputs using
SchemaProjector - call
SchemaProjector.get_instance(target_class) - expose the instantiated object via
target_obj
Constructor kwargs are filtered by the target class __init__ signature (unknown kwargs ignored).
This makes schemas usable both for:
- configuration validation
- object construction
Extending behavior: _operations()
Override _operations() in subclasses to implement custom logic.
At the time _operations() runs:
self.datais available (typed dict)self.target_objmay be available (iftarget_classprovided)- required_file state is available
A common pattern is:
- compute derived quantities
- write output files
- call domain-specific libraries
print-attribute
print_attribute is a special schema feature that enables a built-in CLI option:
--print-attribute ...
This option lets users request printing one or more @property attributes from the instantiated target object.
It is designed for:
- interactive exploration
- debugging / inspection
- reproducible output (printed in YAML-style using
render_mapping)
Requirements
- print-attribute requires a
target_class - print-attribute is enabled and configured by schema descriptor configuration (not per-variable)
When enabled, SchemaCommandLine injects a reserved variable into the schema:
- variable name:
print_attribute(reserved; users must not define it) - type:
string-list - optional:
true
This injection ensures argparse sees a --print-attribute option without requiring you
to bake it into every schema by hand.
Where configuration comes from
SchemaCommandLine resolves print-attribute configuration via:
print_attribute = get_schema_print_attribute(schema_name, schema_definitions)
This supports inheritance and merging rules defined by the schema resolver.
The resolved value may be:
False/None: disabledTrue: enabled, auto mode- a
dict: enabled, with advanced configuration
Print-attribute modes
After resolution, SchemaCommandLine normalizes the configuration into one of two modes:
Auto mode
Auto mode derives choices by introspecting the target class:
- all public
@propertynames (including inherited) - excluding private names starting with
_
Those internal property names are then mapped to external names presented on the CLI.
Manual mode
Manual mode uses an explicit list of external choices provided by schema configuration:
- no class introspection is used to produce the list
- values are still mapped to internal attribute names before reading from the object
Manual mode is selected when the resolved configuration dict contains a choices key and it is not null.
print-attribute: external vs internal names
This is the most important conceptual point:
- The CLI accepts and displays external names only.
- The underlying object is accessed using internal attribute names.
DataSchemer supports two mapping systems that may both apply:
- Schema variable
code_alias(general DataSchemer feature) - print_attribute.alias (specific to print-attribute)
Both contribute to mapping external → internal attribute names for print-attribute.
Schema code_alias interaction
Schema variable definitions may rename how constructor arguments map to the object:
variables:
a:
type: int
code_alias: x
For print-attribute, SchemaCommandLine builds alias maps from schema vars:
- internal → external
- external → internal
and merges them with print-attribute’s explicit alias mapping (below).
print_attribute.alias
In dict form, print-attribute may define:
print_attribute:
alias:
external_name: internal_name
This affects only print-attribute resolution (not object construction).
Disjointness requirement
These two alias maps must be disjoint (no overlapping keys), otherwise it becomes ambiguous.
The implementation validates disjointness using merge_disjoint_maps(...)
and raises DSUserError if they overlap.
print-attribute: choices
In dict form, print-attribute can constrain and control the allowed --print-attribute values.
Manual choices
print_attribute:
choices: ["energy", "volume"]
This:
- forces manual mode
- restricts CLI candidates strictly to those external spellings
- drives tab-completion candidates (when argcomplete is installed)
If choices: null, then:
- the configuration remains enabled
- but it stays in auto mode
- and (crucially) exclude is suppressed because the
choiceskey is present (see exclude semantics below)
This “choices key present” rule is intentional: it lets inheritance signal “explicit configuration” even when the final list is not specified.
Runtime injection of choices for argparse
To support argparse choices and tab-completion, SchemaCommandLine dynamically injects the computed choices into the injected print_attribute variable only at parser construction time.
print-attribute: exclude
In dict form, print-attribute may define an exclude list:
print_attribute:
exclude: ["debug", "internal_state"]
Exclude is applied only in true auto mode, meaning:
- the effective dict does not contain a
choiceskey (including inherited configs)
If choices is present (even as null), exclude is ignored by design.
Exclude works on external spellings only:
- if you exclude
"foo", it will remove both the internal and external names that match"foo"from the auto-derived list
Consistency between choices and ext2int
In auto mode, exclude removes entries from:
- the completion/validation choices list
- the external→internal mapping (
ext2int)
This prevents excluded names from reappearing via alias mappings.
print-attribute and inheritance: practical mental model
Because schema-level print_attribute is inherited and merged, treat it like a small “policy” object:
- A base schema can enable print-attribute broadly.
- A derived schema can:
- add aliases
- extend choices
- add exclusions (in auto mode)
- or disable print-attribute entirely (
false)
The resolver and normalizer together ensure that:
- the CLI only ever exposes external spellings
- printing uses
render_mapping({external_name: value}) - and mapping is deterministic.
How print-attribute output looks
When --print-attribute is used, SchemaCommandLine prints each requested attribute in YAML style:
energy: -12.345
volume: 60.0
Internally, it prints one mapping per attribute, so output remains stream-friendly.
If the external name cannot be resolved to an attribute on the object,
a DSUserError is raised with details (including internal name).
Tab completion with argcomplete
If argcomplete is installed, SchemaCommandLine enables completion via:
argcomplete.autocomplete(parser)
For --print-attribute, completion candidates are the computed external choices.
The canonical helper for computing them is:
compute_print_attribute_choices(...)
Downstream tools (like PM) should call this helper rather than duplicating logic.
Summary
SchemaCommandLine provides a practical, batteries-included way to build CLIs from schemas:
- schema → argparse options (+ help/aliases/choices)
- strict parsing + helpful suggestions
- merge YAML + required_file + CLI with clear precedence
- validate/type using
SchemaProjector - optionally instantiate an object
- optionally print introspected attributes using a carefully designed external/internal mapping system
The print-attribute feature is intentionally rich because it lives at the boundary between schema naming, object naming, and user-visible CLI affordances.