Coercing numbers
DataSchemer is designed to accept human-friendly numeric input and reliably convert it into structured Python data. This is especially important for command-line tools, configuration files, and lightweight data formats, where users naturally write numbers as text, expressions, or simple tables.
The numeric coercion utilities live in coerce_numbers.py and can be used directly, or indirectly through schema-driven workflows.
The core idea
The coercion utilities take loosely structured numeric text and turn it into:
- Python scalars (
int,float,Fraction, …) - Python lists of numbers
- Nested lists representing matrices
The input may contain:
- arithmetic expressions
- symbolic substitutions
- multiple values separated by whitespace or punctuation
- simple matrix layouts
The goal is to let users write what they mean, without forcing rigid syntax or file formats.
Import
The most commonly used entry point is:
from data_schemer.coerce_numbers import coerce_number_list_with_substitutions
This function always returns Python lists, never NumPy arrays.
Results can be trivially converted to NumPy arrays, or one may import
coerce_array_with_substitutions, which returns a NumPy array.
Scalars: numbers as expressions
At the simplest level, a single number—written as text—is parsed and evaluated:
coerce_number_list_with_substitutions("3", dtype=int)
# -> [3]
Expressions are allowed, so users don’t need to precompute values:
coerce_number_list_with_substitutions("1/2", dtype=float)
# -> [0.5]
coerce_number_list_with_substitutions("3*4 + 1", dtype=int)
# -> [13]
Supported arithmetic includes:
- addition and subtraction:
+,- - multiplication and division:
*,/ - scientific notation:
1e-3,2E+4 - square roots using
rNnotation:
coerce_number_list_with_substitutions("r2", dtype=float)
# -> [1.4142135623730951]
This is intentionally small but predictable: enough power for scientific input, without becoming a general-purpose programming language.
Multiple values: lists
Most real inputs contain more than one number. Values can be separated by:
- spaces
- commas
- semicolons
- newlines
All of the following are equivalent:
coerce_number_list_with_substitutions("1 2 3", dtype=int)
# -> [1, 2, 3]
coerce_number_list_with_substitutions("1,2,3", dtype=int)
# -> [1, 2, 3]
coerce_number_list_with_substitutions("1; 2; 3", dtype=int)
# -> [1, 2, 3]
This makes the parser forgiving and easy to use in CLI contexts.
Matrices: nested lists
When separators imply rows (such as newlines or semicolons), the result becomes a nested list:
coerce_number_list_with_substitutions("1,0;0,1", dtype=int)
# -> [[1, 0], [0, 1]]
Newlines work naturally:
text = """
1 0 0
0 1 0
0 0 1
"""
coerce_number_list_with_substitutions(text, dtype=int)
# -> [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
At this stage, the structure is purely Python lists—no NumPy assumptions are made.
Header substitutions: symbolic values
One of the most powerful features is header substitutions.
You can define symbols at the top of the input and reuse them below:
text = """
a=0.5 b=1/3
a, b
0.0, 2*a
"""
coerce_number_list_with_substitutions(text, dtype=float)
# -> [[0.5, 0.3333333333333333], [0.0, 1.0]]
How this works:
- The first line defines substitutions (
a,b) - These symbols are available in all subsequent expressions
- Expressions are evaluated after substitution
This is especially useful for:
- lattice vectors
- parameterized matrices
- avoiding repeated numeric constants
Exact arithmetic with Fraction
By default, numeric expressions are evaluated using floating-point arithmetic.
If you need exact rational values, you can request Fraction explicitly:
from fractions import Fraction
coerce_number_list_with_substitutions("1/3", dtype=Fraction)
# -> [Fraction(1, 3)]
This applies consistently to expressions and substitutions:
text = """
a=1/3
2*a
"""
coerce_number_list_with_substitutions(text, dtype=Fraction)
# -> [Fraction(2, 3)]
This is particularly useful in symbolic or group-theoretical contexts where exact ratios matter.
What this function guarantees
- Output is always a Python list (possibly nested)
- All numeric values are of the requested
dtype - Expressions are evaluated deterministically
- Substitutions are scoped to the input block
- No NumPy dependency is introduced at this level
When to use numeric coercion directly
Use these utilities directly when you are:
- parsing numeric text files
- accepting numeric expressions from users
- handling CLI arguments that represent vectors or matrices
- preprocessing data before schema validation
In schema-driven workflows, you typically won’t call this yourself.
SchemaProjector invokes numeric coercion automatically when a schema variable is declared as numeric.
Design philosophy
The numeric coercion layer is intentionally:
- permissive in input syntax
- strict in output structure
- predictable and reproducible
- free of side effects
It acts as a bridge between human-readable numeric text and strongly-typed Python data—without imposing unnecessary ceremony.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.