Coercing numbers

DataSchemer is designed to accept human-friendly numeric input and reliably convert it into structured Python data. This is especially important for command-line tools, configuration files, and lightweight data formats, where users naturally write numbers as text, expressions, or simple tables.

The numeric coercion utilities live in coerce_numbers.py and can be used directly, or indirectly through schema-driven workflows.

The core idea

The coercion utilities take loosely structured numeric text and turn it into:

Python scalars (int, float, Fraction, …)
Python lists of numbers
Nested lists representing matrices

The input may contain:

arithmetic expressions
symbolic substitutions
multiple values separated by whitespace or punctuation
simple matrix layouts

The goal is to let users write what they mean, without forcing rigid syntax or file formats.

Import

The most commonly used entry point is:

from data_schemer.coerce_numbers import coerce_number_list_with_substitutions

This function always returns Python lists, never NumPy arrays.
Results can be trivially converted to NumPy arrays, or one may import coerce_array_with_substitutions, which returns a NumPy array.

Scalars: numbers as expressions

At the simplest level, a single number—written as text—is parsed and evaluated:

coerce_number_list_with_substitutions("3", dtype=int)
# -> [3]

Expressions are allowed, so users don’t need to precompute values:

coerce_number_list_with_substitutions("1/2", dtype=float)
# -> [0.5]

coerce_number_list_with_substitutions("3*4 + 1", dtype=int)
# -> [13]

Supported arithmetic includes:

addition and subtraction: +, -
multiplication and division: *, /
scientific notation: 1e-3, 2E+4
square roots using rN notation:

coerce_number_list_with_substitutions("r2", dtype=float)
# -> [1.4142135623730951]

This is intentionally small but predictable: enough power for scientific input, without becoming a general-purpose programming language.

Multiple values: lists

Most real inputs contain more than one number. Values can be separated by:

spaces
commas
semicolons
newlines

All of the following are equivalent:

coerce_number_list_with_substitutions("1 2 3", dtype=int)
# -> [1, 2, 3]

coerce_number_list_with_substitutions("1,2,3", dtype=int)
# -> [1, 2, 3]

coerce_number_list_with_substitutions("1; 2; 3", dtype=int)
# -> [1, 2, 3]

This makes the parser forgiving and easy to use in CLI contexts.

Matrices: nested lists

When separators imply rows (such as newlines or semicolons), the result becomes a nested list:

coerce_number_list_with_substitutions("1,0;0,1", dtype=int)
# -> [[1, 0], [0, 1]]

Newlines work naturally:

text = """
1 0 0
0 1 0
0 0 1
"""

coerce_number_list_with_substitutions(text, dtype=int)
# -> [[1, 0, 0], [0, 1, 0], [0, 0, 1]]

At this stage, the structure is purely Python lists—no NumPy assumptions are made.

Header substitutions: symbolic values

One of the most powerful features is header substitutions.

You can define symbols at the top of the input and reuse them below:

text = """
a=0.5 b=1/3
a, b
0.0, 2*a
"""

coerce_number_list_with_substitutions(text, dtype=float)
# -> [[0.5, 0.3333333333333333], [0.0, 1.0]]

How this works:

The first line defines substitutions (a, b)
These symbols are available in all subsequent expressions
Expressions are evaluated after substitution

This is especially useful for:

lattice vectors
parameterized matrices
avoiding repeated numeric constants

Exact arithmetic with `Fraction`

By default, numeric expressions are evaluated using floating-point arithmetic. If you need exact rational values, you can request Fraction explicitly:

from fractions import Fraction

coerce_number_list_with_substitutions("1/3", dtype=Fraction)
# -> [Fraction(1, 3)]

This applies consistently to expressions and substitutions:

text = """
a=1/3
2*a
"""

coerce_number_list_with_substitutions(text, dtype=Fraction)
# -> [Fraction(2, 3)]

This is particularly useful in symbolic or group-theoretical contexts where exact ratios matter.

What this function guarantees

Output is always a Python list (possibly nested)
All numeric values are of the requested dtype
Expressions are evaluated deterministically
Substitutions are scoped to the input block
No NumPy dependency is introduced at this level

When to use numeric coercion directly

Use these utilities directly when you are:

parsing numeric text files
accepting numeric expressions from users
handling CLI arguments that represent vectors or matrices
preprocessing data before schema validation

In schema-driven workflows, you typically won’t call this yourself. SchemaProjector invokes numeric coercion automatically when a schema variable is declared as numeric.

Design philosophy

The numeric coercion layer is intentionally:

permissive in input syntax
strict in output structure
predictable and reproducible
free of side effects

It acts as a bridge between human-readable numeric text and strongly-typed Python data—without imposing unnecessary ceremony.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified January 20, 2026: updating render data (f50f483)