Coercing numbers

DataSchemer is designed to accept human-friendly numeric input and reliably convert it into structured Python data. This is especially important for command-line tools, configuration files, and lightweight data formats, where users naturally write numbers as text, expressions, or simple tables.

The numeric coercion utilities live in coerce_numbers.py and can be used directly, or indirectly through schema-driven workflows.


The core idea

The coercion utilities take loosely structured numeric text and turn it into:

  • Python scalars (int, float, Fraction, …)
  • Python lists of numbers
  • Nested lists representing matrices

The input may contain:

  • arithmetic expressions
  • symbolic substitutions
  • multiple values separated by whitespace or punctuation
  • simple matrix layouts

The goal is to let users write what they mean, without forcing rigid syntax or file formats.


Import

The most commonly used entry point is:

from data_schemer.coerce_numbers import coerce_number_list_with_substitutions

This function always returns Python lists, never NumPy arrays.
Results can be trivially converted to NumPy arrays, or one may import coerce_array_with_substitutions, which returns a NumPy array.


Scalars: numbers as expressions

At the simplest level, a single number—written as text—is parsed and evaluated:

coerce_number_list_with_substitutions("3", dtype=int)
# -> [3]

Expressions are allowed, so users don’t need to precompute values:

coerce_number_list_with_substitutions("1/2", dtype=float)
# -> [0.5]

coerce_number_list_with_substitutions("3*4 + 1", dtype=int)
# -> [13]

Supported arithmetic includes:

  • addition and subtraction: +, -
  • multiplication and division: *, /
  • scientific notation: 1e-3, 2E+4
  • square roots using rN notation:
coerce_number_list_with_substitutions("r2", dtype=float)
# -> [1.4142135623730951]

This is intentionally small but predictable: enough power for scientific input, without becoming a general-purpose programming language.


Multiple values: lists

Most real inputs contain more than one number. Values can be separated by:

  • spaces
  • commas
  • semicolons
  • newlines

All of the following are equivalent:

coerce_number_list_with_substitutions("1 2 3", dtype=int)
# -> [1, 2, 3]

coerce_number_list_with_substitutions("1,2,3", dtype=int)
# -> [1, 2, 3]

coerce_number_list_with_substitutions("1; 2; 3", dtype=int)
# -> [1, 2, 3]

This makes the parser forgiving and easy to use in CLI contexts.


Matrices: nested lists

When separators imply rows (such as newlines or semicolons), the result becomes a nested list:

coerce_number_list_with_substitutions("1,0;0,1", dtype=int)
# -> [[1, 0], [0, 1]]

Newlines work naturally:

text = """
1 0 0
0 1 0
0 0 1
"""

coerce_number_list_with_substitutions(text, dtype=int)
# -> [[1, 0, 0], [0, 1, 0], [0, 0, 1]]

At this stage, the structure is purely Python lists—no NumPy assumptions are made.


Header substitutions: symbolic values

One of the most powerful features is header substitutions.

You can define symbols at the top of the input and reuse them below:

text = """
a=0.5 b=1/3
a, b
0.0, 2*a
"""

coerce_number_list_with_substitutions(text, dtype=float)
# -> [[0.5, 0.3333333333333333], [0.0, 1.0]]

How this works:

  1. The first line defines substitutions (a, b)
  2. These symbols are available in all subsequent expressions
  3. Expressions are evaluated after substitution

This is especially useful for:

  • lattice vectors
  • parameterized matrices
  • avoiding repeated numeric constants

Exact arithmetic with Fraction

By default, numeric expressions are evaluated using floating-point arithmetic. If you need exact rational values, you can request Fraction explicitly:

from fractions import Fraction

coerce_number_list_with_substitutions("1/3", dtype=Fraction)
# -> [Fraction(1, 3)]

This applies consistently to expressions and substitutions:

text = """
a=1/3
2*a
"""

coerce_number_list_with_substitutions(text, dtype=Fraction)
# -> [Fraction(2, 3)]

This is particularly useful in symbolic or group-theoretical contexts where exact ratios matter.


What this function guarantees

  • Output is always a Python list (possibly nested)
  • All numeric values are of the requested dtype
  • Expressions are evaluated deterministically
  • Substitutions are scoped to the input block
  • No NumPy dependency is introduced at this level

When to use numeric coercion directly

Use these utilities directly when you are:

  • parsing numeric text files
  • accepting numeric expressions from users
  • handling CLI arguments that represent vectors or matrices
  • preprocessing data before schema validation

In schema-driven workflows, you typically won’t call this yourself. SchemaProjector invokes numeric coercion automatically when a schema variable is declared as numeric.


Design philosophy

The numeric coercion layer is intentionally:

  • permissive in input syntax
  • strict in output structure
  • predictable and reproducible
  • free of side effects

It acts as a bridge between human-readable numeric text and strongly-typed Python data—without imposing unnecessary ceremony.


Last modified January 20, 2026: updating render data (f50f483)