Best Practices

This page outlines some of the best practices for users of SymPy. The best practices here will help avoid some common bugs and pitfalls that can occur when using SymPy.

This page primarily focuses on best practices that apply generally to all parts of SymPy. Best practices that are specific to certain SymPy submodules or functions are outlined in the documentation for those specific functions.

Basic Usage

Defining Symbols

  • Define symbols with symbols() or Symbol(). The symbols() function is the most convenient way to create symbols. It supports creating one or more symbols at once:

    >>> from sympy import symbols
    >>> x = symbols('x')
    >>> a, b, c = symbols('a b c')
    

    Additionally, it supports adding assumptions to symbols

    >>> i, j, k = symbols('i j k', integer=True)
    

    and defining Function objects:

    >>> from sympy import Function
    >>> f, g, h = symbols('f g h', cls=Function)
    

    It also supports shorthands for defining many numbered symbols at once:

    >>> symbols('x:10')
    (x0, x1, x2, x3, x4, x5, x6, x7, x8, x9)
    

    The Symbol() constructor may also be used directly. Unlike symbols(), Symbol() always creates one symbol. It is the best option if you want to make a symbol with unusual characters in its name or if you are creating symbols programmatically.

    >>> from sympy import Symbol
    >>> x_y = Symbol('x y') # This creates a single symbol named 'x y'
    

    The var() function should be avoided, except when working interactively. It works like the symbols() function, except it automatically injects symbol names into the calling namespace. This function is designed solely for interactive typing convenience and is not recommended for programmatic use.

    Do not use sympify() or S() to create symbols. This may appear to work:

    >>> from sympy import S
    >>> x = S("x") # DO NOT DO THIS
    

    However, S()/sympify() are not designed to create symbols. They are designed to parse entire expressions. This method fails if the input string is not valid Python. It also fails if the string parses to a larger expression:

    >>> # These both fail
    >>> x = S("0x") 
    Traceback (most recent call last):
    ...
    SyntaxError: invalid syntax (<string>, line 1)
    >>> x = S("x+") 
    Traceback (most recent call last):
    ...
    SyntaxError: invalid syntax (<string>, line 1)
    

    Any Python string can be used as a valid Symbol name.

    Furthermore, all the same issues described in the Avoid String Inputs section below apply here.

  • Add assumptions to symbols when they are known. Assumptions can be added by passing the relevant keywords to symbols(). The most common assumptions are real=True, positive=True (or nonnegative=True), and integer=True.

    Assumptions are never required, but it is always recommended to include them if they are known because it will allow certain operations to simplify. If no assumptions are provided, symbols are assumed to be general complex numbers, and simplifications will not be made unless they are true for all complex numbers.

    For example:

    >>> from sympy import integrate, exp, oo
    >>> a = symbols('a') # no assumptions
    >>> integrate(exp(-a*x), (x, 0, oo))
    Piecewise((1/a, Abs(arg(a)) < pi/2), (Integral(exp(-a*x), (x, 0, oo)), True))
    
    >>> a = symbols('a', positive=True)
    >>> integrate(exp(-a*x), (x, 0, oo))
    1/a
    

    Here, \(\int_0^\infty e^{-ax}\,dx\) gives a piecewise result when a is defined with no assumptions, because the integral only converges when a is positive. Setting a to be positive removes this piecewise.

    When you do use assumptions, the best practice is to always use the same assumptions for each symbol name. SymPy allows the same symbol name to be defined with different assumptions, but these symbols will be considered unequal to each other:

    >>> z1 = symbols('z')
    >>> z2 = symbols('z', positive=True)
    >>> z1 == z2
    False
    >>> z1 + z2
    z + z
    

See also Avoid String Inputs and Don’t Hardcode Symbol Names in Python Functions for related best practices around defining symbols.

Avoid String Inputs

Don’t use strings as input to functions. Rather, create the objects symbolically using Symbols and the appropriate SymPy functions, and manipulate them.

Don’t

>>> from sympy import expand
>>> expand("(x**2 + x)/x")
x + 1

Do

>>> from sympy import symbols
>>> x = symbols('x')
>>> expand((x**2 + x)/x)
x + 1

It’s always best to create expressions explicitly using Python operators, but sometimes you really do start with a string input, like if you accept an expression from the user. If you do have a string that you are starting with, you should parse it explicitly with parse_expr(). It is best to parse all strings early and only use symbolic manipulation from there on.

>>> from sympy import parse_expr
>>> string_input = "(x**2 + x)/x"
>>> expr = parse_expr(string_input)
>>> expand(expr)
x + 1

Reason

There are many disadvantages to using strings as input to SymPy functions:

  • It is unpythonic and makes code harder to read. See the Zen of Python “explicit is better than implicit”.

  • Support for string inputs in general SymPy functions is mostly accidental. It happens because these functions call sympify() on their inputs in order to convert things like Python ints into SymPy Integers. However, sympify() also parses strings into SymPy expressions, unless the strict=True flag is used. Automatic parsing of strings for general SymPy functions (other than sympify() or parse_expr()) may go away in a future version of SymPy.

  • Typos in symbol or function names can go unnoticed. This is because all undefined names in the string will be automatically parsed into Symbols or Functions. If the input has a typo, the string will still parse correctly, but the output will not be what was expected. For example

    >>> from sympy import expand_trig
    >>> expand_trig("sine(x + y)")
    sine(x + y)
    

    Compare this to the explicit error you get when not using strings:

    >>> from sympy import sin, symbols
    >>> x, y = symbols('x y')
    >>> expand_trig(sine(x + y)) # The typo is caught by a NameError
    Traceback (most recent call last):
    ...
    NameError: name 'sine' is not defined
    >>> expand_trig(sin(x + y))
    sin(x)*cos(y) + sin(y)*cos(x)
    

    In the first example, sine, a typo for sin, is parsed into Function("sine"), and it appears that expand_trig cannot handle it. In the second case, we immediately get an error from the undefined name sine, and fixing our typo, we see that expand_trig can indeed do what we want.

  • The biggest gotcha when using string inputs comes from using assumptions. In SymPy, if two symbols have the same name but different assumptions, they are considered unequal:

    >>> z1 = symbols('z')
    >>> z2 = symbols('z', positive=True)
    >>> z1 == z2
    False
    >>> z1 + z2
    z + z
    

    It is generally recommended to avoid doing this, as it can lead to confusing expressions like the one above (see Defining Symbols above).

    However, string inputs will always create symbols without assumptions. So if you have a symbol with an assumption and later try to use the string version of it, you will end up with confusing results.

    >>> from sympy import diff
    >>> z = symbols('z', positive=True)
    >>> diff('z**2', z)
    0
    

    The answer here is apparently wrong, but what happened is that the z in "z**2" parsed to Symbol('z') with no assumptions, which SymPy considers to be a different symbol from z = Symbol('z', positive=True), which is used as the second argument to diff(). So as far as diff is concerned, the expression is constant and the result is 0.

    This sort of thing is particularly bad because it generally doesn’t lead to any errors. It will just silently give the “wrong” answer because SymPy will be treating symbols that you thought were the same as different. The situation is avoided by not using string inputs.

    If you are parsing strings, and you want some of the symbols in it to have certain assumptions, you should create those symbols and pass them to the dictionary to parse_expr(). For example:

    Don’t

    >>> a, b, c = symbols('a b c', real=True)
    >>> # a, b, and c in expr are different symbols without assumptions
    >>> expr = parse_expr('a**2 + b - c')
    >>> expr.subs({a: 1, b: 1, c: 1}) # The substitution (apparently) doesn't work
    a**2 + b - c
    

    Do

    >>> # a, b, and c are the same as the a, b, c with real=True defined above
    >>> expr = parse_expr('a**2 + b - c', {'a': a, 'b': b, 'c': c})
    >>> expr.subs({a: 1, b: 1, c: 1})
    1
    
  • Many SymPy operations are defined as methods, not functions, that is, they are called like sympy_obj.method_name(). These methods won’t work on strings, since they are not yet SymPy objects. For example:

    >>> "x + 1".subs("x", "y")
    Traceback (most recent call last):
    ...
    AttributeError: 'str' object has no attribute 'subs'
    

    Contrasted with:

    >>> x, y = symbols('x y')
    >>> (x + 1).subs(x, y)
    y + 1
    
  • Symbol names can contain any character, including things that aren’t valid Python. But if you use strings as input, it is impossible to use such symbols. For example

    >>> from sympy import solve
    >>> solve('x_{2} - 1') 
    ValueError: Error from parse_expr with transformed code: "Symbol ('x_' ){Integer (2 )}-Integer (1 )"
    ...
    SyntaxError: invalid syntax (<string>, line 1)
    

    This doesn’t work because x_{2} is not valid Python. But it is perfectly possible to use this as a Symbol name:

    >>> x2 = symbols('x_{2}')
    >>> solve(x2 - 1, x2)
    [1]
    

    Actually, the above is the best case scenario, where you get an error. It is also possible you might get something unexpected:

    >>> solve('x^1_2 - 1')
    [-1, 1, -I, I, -1/2 - sqrt(3)*I/2, -1/2 + sqrt(3)*I/2, 1/2 - sqrt(3)*I/2, 1/2 + sqrt(3)*I/2, -sqrt(3)/2 - I/2, -sqrt(3)/2 + I/2, sqrt(3)/2 - I/2, sqrt(3)/2 + I/2]
    

    What happened here is that instead of parsing x^1_2 as \(x^1_2\), it is parsed as x**12 (^ is converted to ** and _ is ignored in numeric literals in Python).

    If we instead create a Symbol, the actual contents of the symbol name are ignored. It is always represented as a single symbol.

    >>> x12 = symbols('x^1_2')
    >>> solve(x12 - 1, x12)
    [1]
    
  • If you use strings, syntax errors won’t be caught until the line is run. If you build up the expressions, syntax errors will be caught immediately by before any of it runs.

  • Syntax highlighting in code editors doesn’t typically recognize and color-code the content of strings, whereas it can recognize Python expressions.

Avoid Manipulating Expressions as Strings

If you find yourself doing a lot of string or regular expression manipulations on symbolic expressions, this is generally a sign that you are using SymPy incorrectly. It’s better to build up expressions directly with operators like +, -, *, and / and SymPy’s various functions and methods. String-based manipulations can introduce errors, grow complex quickly, and lose the benefits of symbolic expression structures.

The reason for this is that there is no notion of a symbolic expression in a string. To Python, "(x + y)/z" is no different from "/x+)(y z ", which is the same string with the characters in another order. To contrast, a SymPy expression actually knows about what type of mathematical object it represents. SymPy has many methods and functions for building and manipulating expressions, and they all operate on SymPy objects, not strings.

For example

Don’t

>>> expression_str = '+'.join([f'{i}*x_{i}' for i in range(10)])
>>> expr = parse_expr(expression_str)
>>> expr
x_1 + 2*x_2 + 3*x_3 + 4*x_4 + 5*x_5 + 6*x_6 + 7*x_7 + 8*x_8 + 9*x_9

Do

>>> from sympy import Add, Symbol
>>> expr = Add(*[i*Symbol(f'x_{i}') for i in range(10)])
>>> expr
x_1 + 2*x_2 + 3*x_3 + 4*x_4 + 5*x_5 + 6*x_6 + 7*x_7 + 8*x_8 + 9*x_9

See also the previous section on avoiding string inputs to functions.

Exact Rational Numbers vs. Floats

If a number is known to be exactly equal to some quantity, avoid defining it as a floating-point number.

For example,

Don’t

>>> expression = x**2 + 0.5*x + 1

Do

>>> from sympy import Rational
>>> expression = x**2 + Rational(1, 2)*x + 1
>>> expression = x**2 + x/2 + 1 # Equivalently

However, this isn’t to say that you should never use floating-point numbers in SymPy, only that if a more exact value is known it should be preferred. SymPy does support arbitrary precision floating-point numbers, but some operations may not perform as well with them.

This also applies to non-rational numbers which can be represented exactly. For example, one should avoid using math.pi and prefer sympy.pi, since the former is a numerical approximation to \(\pi\) and the latter is exactly \(\pi\) (see also Separate Symbolic and Numeric Code below; in general, one should avoid importing math when using SymPy).

Don’t

>>> import math
>>> import sympy
>>> math.pi
3.141592653589793
>>> sympy.sin(math.pi)
1.22464679914735e-16

Do

>>> sympy.pi
pi
>>> sympy.sin(sympy.pi)
0

Here sympy.sin(math.pi) is not exactly 0, because math.pi is not exactly \(\pi\).

One should also take care to avoid writing integer/integer where both integers are explicit integers. This is because Python will evaluate this to a floating-point value before SymPy is able to parse it.

Don’t

>>> x + 2/7 # The exact value of 2/7 is lost
x + 0.2857142857142857

In this case, use Rational to create a rational number, or use S() shorthand if you want to save on typing.

Do

>>> from sympy import Rational, S
>>> x + Rational(2, 7)
x + 2/7
>>> x + S(2)/7 # Equivalently
x + 2/7

Reason

Exact values, if they are known, should be preferred over floats for the following reasons:

  • An exact symbolic value can often be symbolically simplified or manipulated. A float represents an approximation to an exact real number, and therefore cannot be simplified exactly. For example, in the above example, sin(math.pi) does not produce 0 because math.pi is not exactly \(\pi\). It is just a floating-point number that approximates \(\pi\) to 15 digits (effectively, a close rational approximation to \(\pi\), but not exactly \(\pi\)).

  • Some algorithms will not be able to compute a result if there are floating-point values, but can if the values are rational numbers. This is because rational numbers have properties that make it easier for these algorithms to work with them. For instance, with floats, one can have a situation where a number should be 0, but due to approximation errors, does not equal exactly 0.

    A particularly notable example of this is with floating-point exponents. For example,

    >>> from sympy import factor
    >>> factor(x**2.0 - 1)
    x**2.0 - 1
    >>> factor(x**2 - 1)
    (x - 1)*(x + 1)
    
  • SymPy Floats have the same loss of significance cancellation issues that can occur from using finite precision floating-point approximations:

    >>> from sympy import expand
    >>> expand((x + 1.0)*(x - 1e-16)) # the coefficient of x should be slightly less than 1
    x**2 + 1.0*x - 1.0e-16
    >>> expand((x + 1)*(x - Rational('1e-16'))) # Using rational numbers gives the coefficient of x exactly
    x**2 + 9999999999999999*x/10000000000000000 - 1/10000000000000000
    

    It is possible to avoid these issues in SymPy in many cases by making careful use of evalf with its ability to evaluate in arbitrary precision. This typically involves either computing an expression with symbolic values and substituting them later with expr.evalf(subs=...), or by starting with Float values with a precision higher than the default of 15 digits:

    >>> from sympy import Float
    >>> expand((x + 1.0)*(x - Float('1e-16', 20)))
    x**2 + 0.9999999999999999*x - 1.0e-16
    

A Float number can be converted to its exact rational equivalent by passing it to Rational. Alternatively, you can use nsimplify to find the nicest rational approximation. This can sometimes reproduce the number that was intended if the number is supposed to be rational (although again, it’s best to just start with rational numbers in the first place, if you can):

>>> from sympy import nsimplify
>>> Rational(0.7)
3152519739159347/4503599627370496
>>> nsimplify(0.7)
7/10

Avoid simplify()

simplify() (not to be confused with sympify()) is designed as a general purpose heuristic. It tries various simplification algorithms on the input expression and returns the result that seems the “simplest” based on some metric.

simplify() is perfectly fine for interactive use, where you just want SymPy to do whatever it can to an expression. However, in programmatic usage, it’s better to avoid simplify() and use more targeted simplification functions instead (e.g., cancel(), expand(), or collect()).

There are a few reasons why this is generally preferred:

  • Due to its heuristical nature, simplify() can potentially be slow, since it tries a lot of different approaches to try to find the best simplification.

  • There are no guarantees about what form an expression will have after being passed through simplify(). It may actually end up “less simple” by whatever metric you were hoping for. To contrast, targeted simplification functions are very specific about what behaviors they have and what they guarantee about the output. For example,

    • factor() will always factor a polynomial into irreducible factors.

    • cancel() will always convert a rational function into the form \(p/q\) where \(p\) and \(q\) are expanded polynomials with no common factors.

    The documentation for each function describes exactly what behavior it will have on the input expression.

  • A targeted simplification will not do something unexpected if the expression contains an unexpected form, or an unexpected subexpression. This is especially the case if simplification functions are applied with deep=False to only apply the simplification to the top-level expression.

Some other simplification functions are heuristical in nature, and care should be taken with them as well. For example, the trigsimp() function is a heuristic targeted to trigonometric functions, but the routines in the sympy.simplify.fu submodule allow applying specific trigonometric identities.

The simplify section of the tutorial and the simplify module reference list the various targeted simplification functions.

In some cases, you may know exactly what simplification operations you wish to apply to an expression, but there may not be an exact set of simplification functions that do them. When this happens, you can create your own targeted simplification using replace(), or in general, manually using advanced expression manipulation.

Don’t Hardcode Symbol Names in Python Functions

Instead of hard-coding Symbol names inside of a function definition, make the symbols a parameter to the function.

For example, consider a function theta_operator that computes the theta operator \(\theta = zD_z\):

Don’t

def theta_operator(expr):
    z = symbols('z')
    return z*expr.diff(z)

Do

def theta_operator(expr, z):
    return z*expr.diff(z)

A hard-coded symbol name has the disadvantage of requiring all expressions to use that exact symbol name. In the above example, it is not possible to compute \(\theta = xD_x\) because it is hard-coded to \(zD_z\). What’s worse, trying to do so silently leads to a wrong result instead of an error, since x is treated as a constant expression:

>>> def theta_operator(expr):
...     z = symbols('z')
...     return z*expr.diff(z)
>>> theta_operator(x**2) # The expected answer is 2*x**2
0

This is particularly problematic if the function accepts arbitrary user input, as the user may be using a different variable name that makes more sense in their mathematical context. And if the user already used the symbol z but as a constant, they would need to swap things around with subs before being able to use the function.

The other reason this antipattern is problematic is due to the gotcha that symbols with assumptions are considered unequal to symbols without assumptions. If someone defined their expression using

>>> z = symbols('z', positive=True)

for example, to make further simplifications possible (see Defining Symbols above), the function hard-coding Symbol('z') without assumptions would not work:

>>> theta_operator(z**2)
0

By making the symbol an argument to the function, like theta_operator(expr, z), these problems all go away.

Separate Symbolic and Numeric Code

SymPy sets itself apart from most of the rest of the libraries in the Python ecosystem in that it operates symbolically, whereas other libraries, like NumPy, operate numerically. These two paradigms are different enough that it’s always best to keep them as separate as possible.

Importantly, SymPy is not designed to work with NumPy arrays, and conversely, NumPy will not work directly with SymPy objects.

>>> import numpy as np
>>> import sympy
>>> a = np.array([0., 1., 2.])
>>> sympy.sin(a)
Traceback (most recent call last):
...
AttributeError: 'ImmutableDenseNDimArray' object has no attribute 'as_coefficient'
>>> x = Symbol('x')
>>> np.sin(x) # NumPy functions do not know how to handle SymPy expressions
Traceback (most recent call last):
...
TypeError: loop of ufunc does not support argument 0 of type Symbol which has no callable sin method

If you want to use both SymPy and NumPy, you should explicitly convert your SymPy expressions into NumPy functions using lambdify(). The typical workflow in SymPy is to model your problem symbolically using SymPy, then convert the result into a numerical function with lambdify() that can be evaluated on NumPy arrays. For advanced use-cases, lambdify()/NumPy may not be enough and you may instead need to use SymPy’s more general code generation routines to generate code for other fast numerical languages such as Fortran or C.

>>> # First symbolically construct the expression you are interested in with SymPy
>>> from sympy import diff, sin, exp, lambdify, symbols
>>> x = symbols('x')
>>> expr = diff(sin(x)*exp(x**2), x)

>>> # Then convert it to a numeric function with lambdify()
>>> f = lambdify(x, expr)

>>> # Now use this function with NumPy
>>> import numpy as np
>>> a = np.linspace(0, 10)
>>> f(a) 
[ 1.00000000e+00  1.10713341e+00  1.46699555e+00 ... -3.15033720e+44]

These are some antipatterns that should be generally avoided

  • Do not use import math. It is virtually never necessary to use the standard library math module alongside SymPy (or NumPy). Every function that is in math is already in SymPy. SymPy can compute values numerically using evalf, which provides more precision and accuracy than math. Or better, SymPy will by default compute things symbolically. Functions and constants in math are floats, which are inexact. SymPy always works better with exact quantities when possible. For example,

    >>> import math
    >>> math.pi # a float
    3.141592653589793
    >>> import sympy
    >>> sympy.sin(math.pi)
    1.22464679914735e-16
    

    The result of sympy.sin(math.pi) is not 0 as you might expect, because math.pi is only an approximation of \(\pi\), equal to 16 digits. On the other hand, sympy.pi is exactly equal to \(\pi\) because it is represented symbolically, so it is able to give the exact answer:

    >>> sympy.sin(sympy.pi)
    0
    

    So in general, one should prefer symbolic representations. But even if you actually do want a float, you are better off using SymPy’s evalf() rather than math. This avoids the pitfall that math functions can only operate on float objects, not symbolic expressions

    >>> x = Symbol('x')
    >>> math.sin(x)
    Traceback (most recent call last):
    ...
    TypeError: Cannot convert expression to float
    

    And furthermore, SymPy’s evalf() is more accurate than math, because it uses arbitrary precision arithmetic, and allows you to specify any number of digits.

    >>> sympy.sin(1).evalf(30)
    0.841470984807896506652502321630
    >>> math.sin(1)
    0.8414709848078965
    

    Even when using NumPy, math should be avoided. NumPy functions are faster than their math equivalents, support a larger range of numerical dtypes, and can operate on arrays of values, whereas math functions can only operate on a single scalar at a time.

  • Don’t pass SymPy expressions to a NumPy function. You should not pass a SymPy expression to a NumPy function. This includes anything in the numpy or scipy namespaces, as well as most functions from other Python libraries such as matplotlib. These functions are only designed to work with NumPy arrays with numeric values.

  • Don’t pass SymPy expressions to a lambdified function. Similar to the previous point, you should not pass SymPy expressions to a function created with lambdify. In effect, the functions returned by lambdify are NumPy functions, so the situation here is exactly the same. It is possible that in some cases a function created from lambdify() will work with a SymPy expression, but this is just an accident of the way it works. See the “how it works” section of the lambdify() documentation for more details on why this happens.

  • Avoid storing SymPy expressions in a NumPy array. While it is technically possible to store SymPy expressions inside of a NumPy array, doing so usually represents a mistake. A sign that this is happening is if the dtype of the NumPy array is object (instead of a numeric dtype like float64 or int64).

    Just as one should avoid using NumPy when doing symbolic calculations with SymPy, one should stop using SymPy once the calculation have moved over to the numeric side of things with NumPy.

    A NumPy array that contains SymPy expressions effectively has the same problem as trying to call NumPy functions directly on a SymPy expression. They do not know how to operate on SymPy objects, so they will fail. This applies even if the SymPy objects are all SymPy Floats.

    >>> import numpy as np
    >>> import sympy
    >>> a = np.asarray([sympy.Float(1.0), sympy.Float(0.0)]) # Do not do this
    >>> print(repr(a)) # Note that the dtype is 'object'
    array([1.00000000000000, 0.0], dtype=object)
    >>> np.sin(a)
    Traceback (most recent call last):
    ...
    TypeError: loop of ufunc does not support argument 0 of type Float which has no callable sin method
    

    If you are doing this, you should probably either be using native NumPy floats, or, if you really do want to store an array of SymPy expressions, you should use SymPy’s Matrix or NDimArray classes.

Advanced Usage

Be Careful Comparing and Sorting Symbolic Objects

Be careful with programmatic code that compares numerical quantities, either directly using an inequality (<, <=, >, >=) or indirectly with something like sorted. The issue is that if an inequality is unknown, the result will be symbolic, like

>>> x > 0
x > 0

A symbolic inequality will raise an exception if bool() is called on it, due to the ambiguity:

>>> bool(x > 0)
Traceback (most recent call last):
...
TypeError: cannot determine truth value of Relational

A check like

if x > 0:
    ...

May work just fine if you only ever test it for numerical x. But if x can ever be symbolic, the above code is wrong. It will fail with TypeError: cannot determine truth value of Relational. If you ever see this exception, it means this error has been made somewhere (sometimes the error is in SymPy itself; if this appears to be the case, please open an issue).

The exact same issue occurs when using sorted, since this internally uses >.

>>> sorted([x, 0])
Traceback (most recent call last):
...
TypeError: cannot determine truth value of Relational

There are a few options for fixing this issue, and the correct one to choose depends on what you are doing:

  • Disallow symbolic inputs. If your function cannot possibly work on symbolic inputs, you can explicitly disallow them. The primary benefit here is to give a more readable error message to users than TypeError:  cannot determine truth value of Relational. The is_number attribute can be used to check if an expression can be evaluated to a specific number with evalf(). If you want to only accept integers, you can check isinstance(x, Integer) (after calling sympify() to convert Python ints). Beware that is_integer uses the assumptions system and may be True even for symbolic objects, like Symbol('x', integer=True).

  • Use the assumptions system. If you do support symbolic inputs, you should use the assumptions system to check for things like x > 0, e.g., using x.is_positive. When doing this, you should always be aware of the nuances of the three-valued fuzzy logic used in the assumptions system. That is, always be aware that an assumption could be None, meaning its value is unknown and could be either true or false. For example,

    if x.is_positive:
        ...
    

    will only run the block if x.is_positive is True, but you may want to do something when x.is_positive is None.

  • Return a Piecewise result. If the result of a function depends on an inequality or other boolean condition, you can use Piecewise to return a result that represents both possibilities symbolically. This is generally preferred when possible, as it offers the most flexibility. This is because the result is represented symbolically, meaning, for instance, one can later substitute specific values for the symbols and it will evaluate to the specific case, even if it is combined with other expressions.

    For example, instead of

    if x > 0:
        expr = 1
    else:
        expr = 0
    

    this can be represented symbolically as

    >>> from sympy import Piecewise, pprint
    >>> expr = Piecewise((1, x > 0), (0, True))
    >>> pprint(expr, use_unicode=True)
    ⎧1  for x > 0
    
    ⎩0  otherwise
    >>> expr.subs(x, 1)
    1
    >>> expr.subs(x, -1)
    0
    
  • Use ordered() to sort expressions into a canonical order. If you are trying to use sorted because you want a canonical ordering, but you don’t particularly care what that ordering is, you can use ordered.

    >>> from sympy import ordered
    >>> list(ordered([x, 0]))
    [0, x]
    

    Alternatively, try to write the function in a way so that the correctness of the result does not depend on the order that arguments are processed in.

Custom SymPy Objects

SymPy is designed to be extended with custom classes, typically by subclassing Basic, Expr, or Function. All the symbolic classes in SymPy itself are written this way, and the points here apply equally to them as to user-defined classes.

For an in-depth guide on how to write a Function subclass, see the guide on writing custom functions.

Args Invariants

Custom SymPy objects should always satisfy the following invariants:

  1. all(isinstance(arg, Basic) for arg in args)

  2. expr.func(*expr.args) == expr

The first says that all elements of args should be instances of Basic. The second says that an expression should be rebuildable from its args (note that func is usually the same as type(expr), though it may not always be).

These two invariants are assumed throughout SymPy, and are essential for any function that manipulates expressions.

For example, consider this simple function, which is a simplified version of xreplace():

>>> def replace(expr, x, y):
...     """Replace x with y in expr"""
...     newargs = []
...     for arg in expr.args:
...         if arg == x:
...             newargs.append(y)
...         else:
...             newargs.append(replace(arg, x, y))
...     return expr.func(*newargs)
>>> replace(x + sin(x - 1), x, y)
y + sin(y - 1)

The function works by recursively traversing the args of expr, and rebuilding it except any instances of x are replaced by y.

It’s easy to see how this function would break if the args invariants did not hold:

  1. If an expression had args that were not Basic, they would fail with AttributeError on a recursive call, because the non-Basic args would not have the .args or .func attributes.

  2. If an expression did not rebuild from its args, the line return exr.func(*newargs) would fail, even in the trivial case where none of the args are changed by the replacement, which should effectively be a no-op.

Making all args instances of Basic usually just means calling _sympify() on the inputs to the class so that they are basic instances. If you want to store a string on a class, you should either use a Symbol or sympy.core.symbols.Str.

In some cases a class may accept args in multiple equivalent forms. It is important that whatever form is stored in args is one of the ways that can be used to reconstruct the class. It is okay to normalize args as long as that normalized form is accepted as input. For example, Integral always stores the variable argument as a tuple to make things easier to process internally, but this form is also accepted by the class constructor:

>>> from sympy import Integral
>>> expr = Integral(sin(x), x)
>>> expr.args # args are normalized
(sin(x), (x,))
>>> Integral(sin(x), (x,)) # Also accepted
Integral(sin(x), x)

Note that most user-defined custom functions should be defined by subclassing Function (see the guide to writing custom functions). The Function class automatically takes care of both of the args invariants, so if you are using it, you do not need to worry about this.

Avoid Too Much Automatic Evaluation

When defining a custom function, avoid doing too much automatic evaluation (i.e., evaluation in the eval or __new__ methods).

Generally, automatic evaluation should only be done in instances where it is fast, and it is something that no one ever want to not happen. Automatic evaluation is difficult to undo. A good rule of thumb is to evaluate on explicit numeric values (isinstance(x, Number)), and leave everything else symbolically unevaluated. Further simplification using more advanced identities should be done in specific simplification functions or doit (see the custom functions guide for a list of common simplification routines that can be defined on SymPy objects).

The custom functions guide goes over this in depth (but note that this guideline applies equally to all SymPy objects, not just functions). But in a nutshell, the reason for this is that the only way to prevent automatic evaluation is to use evaluate=False, which is fragile. Additionally, code will invariably be written assuming the invariants that are true due to automatic evaluations, meaning that expressions created with evaluate=False can lead to wrong results from this code. This also means that removing automatic evaluation later can be difficult.

Evaluation that can potentially be expensive (for instance, applying a symbolic identity) is itself bad because it can make creating an expression without even doing anything with it allow. This also applies to checking for symbolic assumptions (like x.is_integer), so this should also be avoided in class constructors.

Don’t

class f(Function):
    @classmethod
    def eval(cls, x):
        if x.is_integer: # Bad (checking general assumptions)
            return 0
        if isinstance(x, Add): # Bad (applying symbolic identities)
            return Add(*[f(i) for i in x.args])

Do

class f(Function):
    @classmethod
    def eval(cls, x):
        if isinstance(x, Integer): # Good (only evaluating on explicit integers)
            return 0

    # Good (applying simplification on assumptions in doit())
    def doit(self, deep=True, **hints):
        x = self.args[0]
        if deep:
           x = x.doit(deep=deep, **hints)
        if x.is_integer:
           return S(0)
        return self

    # Good (applying symbolic identities inside of simplification functions)
    def _eval_expand_func(self, **hints):
        x = self.args[0]
        if isinstance(x, Add):
            return Add(*[f(i) for i in x.args])
        return self

Note that not all the classes in SymPy currently follow this guideline very well, but it is something that we are improving.

Don’t Denest Collections

Functions and classes that accept an arbitrary number of arguments should either accept the arguments directly, like f(*args), or as a single argument, like f(args). They should not try to support both at once.

The reason is that this makes it impossible to represented nested collections. For example, take the FiniteSet class. It is constructed like FiniteSet(x, y, z) (i.e., using *args).

>>> from sympy import FiniteSet
>>> FiniteSet(1, 2, 3)
{1, 2, 3}

It might be tempting to also support FiniteSet([1, 2, 3]), to match the built-in set. However, doing so would make it impossible to represent a nested FiniteSet containing a single FiniteSet, like \(\{\{1, 2, 3\}\}\):

>>> FiniteSet(FiniteSet(1, 2, 3)) # We don't want this to be the same as {1, 2, 3}
FiniteSet({1, 2, 3})

As to whether args or *args should be used, if it is only possible for there to be a finite number of arguments, *args is generally better, as this makes things easier to deal with using the object’s args, since obj.args will be the direct arguments of the class. However, if it is possible that you might want to support a symbolic infinite collection in addition to finite ones, like Integers or Range, then it is better to use args as this will be impossible to do with *args.

Avoid Storing Extra Attributes on an Object

A common reason that you might want to create a custom SymPy object is that you want to store extra attributes on the object. However, doing this in a naive way, i.e., by simply storing the data as a Python attribute on the object, is almost always a bad idea.

SymPy does not expect objects to have extra data stored in them beyond what is in their args. For instance, this breaks == checking, which only compares an objects args. See the Don’t Overwrite __eq__ section below for why it is a bad idea to override __eq__. This section and that one are closely related.

Typically, there is a better way to do what you are trying to do, depending on the specific details of your situation:

  • Store the extra data in the object’s args. This is the best approach if the extra data you want to store is part of the mathematical description of your object.

    As long as the data is representable using other SymPy objects, it can be stored in args. Note that an object’s args should be usable to recreate the object (e.g., something like YourObject(*instance.args) should recreate instance).

    Additionally, it should be mentioned that it is not a good idea to subclass Symbol if you plan to store anything extra in args. Symbol is designed around having no args. You are better off subclassing Function (see Writing Custom Functions) or Expr directly. If you simply want to have two symbols that are distinct from one another, the best approach is often just to give them different names. If you are concerned about how they are printed, you can replace them with a more canonical name when it comes time to print things, or use a custom printer.

  • Store the data about the object separately. This is the best approach if the extra data is not directly related to an object’s mathematical properties.

    Remember that SymPy objects are hashable, so they can easily be used as dictionary keys. So maintaining a separate dictionary of {object: extra_data} pairs is straightforward.

    Note that some SymPy APIs already allow redefining how they operate on objects separately from the objects themselves. A big example of this is the printers, which allow defining custom printers that change how any SymPy object is printed without modifying those object themselves. Functions like lambdify() and init_printing() allow passing in a custom printer.

  • Represent the attribute using different subclasses. This is often a good idea if there are only a few possible values for the attribute (e.g., a boolean flag). Code duplication can be avoided by using a common superclass.

  • If the data you want to store is a Python function, it’s best to just use as a method on the class. In many cases, the method may already fit into one of the existing set of overridable SymPy methods. If you want to define how a function evaluates itself numerically, you can use implemented_function().

  • Represent the information using by modifying the object’s func. This solution is much more complicated than the others, and should only be used when it is necessary. In some extreme cases, it is not possible to represent every mathematical aspect of an object using args alone. This can happen, for example, because of the limitation that args should only contain Basic instances. It is still possible to create custom SymPy objects in these situations by using a custom func that is different from type(expr) (in this case, you would override __eq__ on the func rather than on the class).

    However, this sort of situation is rare.

Don’t Overwrite __eq__

When building a custom SymPy object, it is sometimes tempting to overwrite __eq__ to define custom logic for the == operator. This is almost always a bad idea. Custom SymPy classes should leave __eq__ undefined and use the default implementation in the Basic superclass.

In SymPy, == compares objects using structural equality. That is, a == b means that a and b are exactly the same object. They have the same type and the same args. == does not perform any sort of mathematical equality checking. For example,

>>> x*(x - 1) == x**2 - x
False

== also always returns a boolean True or False. Symbolic equations can be represented with Eq.

There are several reasons for this

  • Mathematical equality checking can be very expensive to compute, and in general, it is computationally impossible to determine.

  • Python itself automatically uses == in various places and assumes that it returns a boolean and is inexpensive to compute. For example, if b is a builtin Python container like list, dict, or set, then a in b uses ==.[1]

  • SymPy internally uses == all over the place, both explicitly and implicitly via things like in or dictionary keys. This usage all implicitly assumes that == operates structurally.

In affect, structural equality means that if a == b is True, then a and b are for all intents and purposes the same object. This is because all SymPy objects are immutable. When a == , any SymPy function may freely replace a with b in any subexpression.

The default __eq__ method on Basic checks if the two objects have the same type and the same args. There are also many parts of SymPy that implicitly assume that if two objects are equal, then they have the same args. Therefore, it is not a good idea to try to override __eq__ as a way to avoid storing some identifying information about an object in its args. The args of an object should contain everything that is needed to recreate it (see args). Note that it is possible for an objects constructor to accept multiple forms of arguments, so long as it accepts the form stored in args (e.g., it is perfectly fine for some args to have default values).

Here are some examples of reasons you might be tempted to override __eq__ and the preferred alternatives:

  • To make == apply some smarter equality check than purely structural equality. As noted above, this is a bad idea because too many things implicitly assume == works structurally only. Instead, use a function or method to implement the smarter equality checking (for example, the equals method).

    Another option is to define a canonicalization method that puts objects into canonical form (e.g., via doit), so that, for instance, x.doit() == y.doit() is true whenever x and y are mathematically equal. This is not always possible because not every type of object has a computable canonical form, but it is a convenient approach when one does exist.

  • To make == check for some additional attributes beyond those stored in the args of an expression. See the Avoid Storing Extra Attributes on an Object section above for more details on why it’s a bad idea to directly store extra attributes on a SymPy object, and what the best alternatives are.

  • To make == compare equal to some non-SymPy object. It is preferable to extend sympify to be able to convert this object into the SymPy object. The default __eq__ implementation will automatically call sympify on the other argument if it isn’t a Basic instance (e.g., Integer(1) == int(1) gives True). It is possible to extend sympify both for objects you control by defining a _sympy_ method and for objects you do not control by extending the converter dictionary. See the sympify() documentation for more details.

Avoiding Infinite Recursion from Assumptions Handlers

When writing assumptions handlers on custom functions like _eval_is_positive (see the custom functions guide for details on how to do this), there are two important things to keep in mind:

Firstly, avoid creating new expressions inside of an assumption handler. You should always pull apart the arguments of a function directly instead. The reason is that creating a new expression could itself result in an assumptions query. This can easily lead to infinite recursion. And even when it doesn’t, creating a new expression that itself could lead to many recursive assumptions queries is bad for performance compared to querying the desired property more directly.

This generally means using methods like as_independent() or {meth}~.as_coeff_muland checking theargs` of expressions directly (see the custom functions guide for an example).

Secondly, do not recursively evaluate assumptions on self in assumptions handlers. Assumptions handlers should only check for assumptions on self.args. The global assumptions system will automatically handle implications between different assumptions.

For example, you may be tempted to write something like

# BAD

class f(Function):
    def _eval_is_integer(self):
        # Quick return if self is not real (do not do this).
        if self.is_real is False:
            return False
        return self.args[0].is_integer

However, the if self.is_real is False check is completely unnecessary. The assumptions system already knows that integer implies real, and it will not bother checking is_integer if it already knows that is_real is False.

If you define the function this way, it will lead to an infinite recursion:

>>> class f(Function):
...     def _eval_is_integer(self):
...         if self.is_real is False:
...             return False
...         return self.args[0].is_integer
>>> f(x).is_real
Traceback (most recent call last):
...
RecursionError: maximum recursion depth exceeded while calling a Python object

Instead, define the handler based on the arguments of the function only:

# GOOD

class f(Function):
    def _eval_is_integer(self):
        return self.args[0].is_integer