There is a general need for looping over not only functions on scalars but also over functions on vectors (or arrays). This concept is realized in NumPy by generalizing the universal functions (ufuncs). In regular ufuncs, the elementary function is limited to element-by-element operations, whereas the generalized version (gufuncs) supports “sub-array” by “sub-array” operations. The Perl vector library PDL provides a similar functionality and its terms are re-used in the following.
Each generalized ufunc has information associated with it that states what the “core” dimensionality of the inputs is, as well as the corresponding dimensionality of the outputs (the element-wise ufuncs have zero core dimensions). The list of the core dimensions for all arguments is called the “signature” of a ufunc. For example, the ufunc numpy.add has signature (),()->()
defining two scalar inputs and one scalar output.
Another example is the function inner1d(a, b)
with a signature of (i),(i)->()
. This applies the inner product along the last axis of each input, but keeps the remaining indices intact. For example, where a
is of shape (3, 5, N)
and b
is of shape (5, N)
, this will return an output of shape (3,5)
. The underlying elementary function is called 3 * 5
times. In the signature, we specify one core dimension (i)
for each input and zero core dimensions ()
for the output, since it takes two 1-d arrays and returns a scalar. By using the same name i
, we specify that the two corresponding dimensions should be of the same size.
The dimensions beyond the core dimensions are called “loop” dimensions. In the above example, this corresponds to (3, 5)
.
The signature determines how the dimensions of each input/output array are split into core and loop dimensions:
i
in inner1d
’s (i),(i)->()
) must have exactly matching sizes, no broadcasting is performed.Typically, the size of all core dimensions in an output will be determined by the size of a core dimension with the same label in an input array. This is not a requirement, and it is possible to define a signature where a label comes up for the first time in an output, although some precautions must be taken when calling such a function. An example would be the function euclidean_pdist(a)
, with signature (n,d)->(p)
, that given an array of n
d
-dimensional vectors, computes all unique pairwise Euclidean distances among them. The output dimension p
must therefore be equal to n * (n - 1) / 2
, but it is the caller’s responsibility to pass in an output array of the right size. If the size of a core dimension of an output cannot be determined from a passed in input or output array, an error will be raised.
Note: Prior to NumPy 1.10.0, less strict checks were in place: missing core dimensions were created by prepending 1’s to the shape as necessary, core dimensions with the same label were broadcast together, and undetermined dimensions were created with size 1.
The signature defines “core” dimensionality of input and output variables, and thereby also defines the contraction of the dimensions. The signature is represented by a string of the following format:
(i_1,...,i_N)
; a scalar input/output is denoted by ()
. Instead of i_1
, i_2
, etc, one can use any valid Python variable name.","
. Input/output arguments are separated by "->"
.The formal syntax of signatures is as follows:
<Signature> ::= <Input arguments> "->" <Output arguments> <Input arguments> ::= <Argument list> <Output arguments> ::= <Argument list> <Argument list> ::= nil | <Argument> | <Argument> "," <Argument list> <Argument> ::= "(" <Core dimension list> ")" <Core dimension list> ::= nil | <Core dimension> | <Core dimension> "," <Core dimension list> <Core dimension> ::= <Dimension name> <Dimension modifier> <Dimension name> ::= valid Python variable name | valid integer <Dimension modifier> ::= nil | "?"
Notes:
Here are some examples of signatures:
name | signature | common usage |
---|---|---|
add | (),()->() | binary ufunc |
sum1d | (i)->() | reduction |
inner1d | (i),(i)->() | vector-vector multiplication |
matmat | (m,n),(n,p)->(m,p) | matrix multiplication |
vecmat | (n),(n,p)->(p) | vector-matrix multiplication |
matvec | (m,n),(n)->(m) | matrix-vector multiplication |
matmul | (m?,n),(n,p?)->(m?,p?) | combination of the four above |
outer_inner | (i,t),(j,t)->(i,j) | inner over the last dimension, outer over the second to last, and loop/broadcast over the rest. |
cross1d | (3),(3)->(3) | cross product where the last dimension is frozen and must be 3 |
The last is an instance of freezing a core dimension and can be used to improve ufunc performance
The current interface remains unchanged, and PyUFunc_FromFuncAndData
can still be used to implement (specialized) ufuncs, consisting of scalar elementary functions.
One can use PyUFunc_FromFuncAndDataAndSignature
to declare a more general ufunc. The argument list is the same as PyUFunc_FromFuncAndData
, with an additional argument specifying the signature as C string.
Furthermore, the callback function is of the same type as before, void (*foo)(char **args, intp *dimensions, intp *steps, void *func)
. When invoked, args
is a list of length nargs
containing the data of all input/output arguments. For a scalar elementary function, steps
is also of length nargs
, denoting the strides used for the arguments. dimensions
is a pointer to a single integer defining the size of the axis to be looped over.
For a non-trivial signature, dimensions
will also contain the sizes of the core dimensions as well, starting at the second entry. Only one size is provided for each unique dimension name and the sizes are given according to the first occurrence of a dimension name in the signature.
The first nargs
elements of steps
remain the same as for scalar ufuncs. The following elements contain the strides of all core dimensions for all arguments in order.
For example, consider a ufunc with signature (i,j),(i)->()
. In this case, args
will contain three pointers to the data of the input/output arrays a
, b
, c
. Furthermore, dimensions
will be [N, I, J]
to define the size of N
of the loop and the sizes I
and J
for the core dimensions i
and j
. Finally, steps
will be [a_N, b_N, c_N, a_i, a_j, b_i]
, containing all necessary strides.
© 2005–2019 NumPy Developers
Licensed under the 3-clause BSD License.
https://docs.scipy.org/doc/numpy-1.17.0/reference/c-api.generalized-ufuncs.html