Created starter files for the project.

This commit is contained in:
Batuhan Berk Başoğlu 2020-10-02 21:26:03 -04:00
commit 73f0c0db42
1992 changed files with 769897 additions and 0 deletions

View file

@ -0,0 +1,26 @@
import os
ref_dir = os.path.join(os.path.dirname(__file__))
__all__ = sorted(f[:-3] for f in os.listdir(ref_dir) if f.endswith('.py') and
not f.startswith('__'))
for f in __all__:
__import__(__name__ + '.' + f)
del f, ref_dir
__doc__ = """\
Topical documentation
=====================
The following topics are available:
%s
You can view them by
>>> help(np.doc.TOPIC) #doctest: +SKIP
""" % '\n- '.join([''] + __all__)
__all__.extend(['__doc__'])

View file

@ -0,0 +1,341 @@
"""
============
Array basics
============
Array types and conversions between types
=========================================
NumPy supports a much greater variety of numerical types than Python does.
This section shows which are available, and how to modify an array's data-type.
The primitive types supported are tied closely to those in C:
.. list-table::
:header-rows: 1
* - Numpy type
- C type
- Description
* - `np.bool_`
- ``bool``
- Boolean (True or False) stored as a byte
* - `np.byte`
- ``signed char``
- Platform-defined
* - `np.ubyte`
- ``unsigned char``
- Platform-defined
* - `np.short`
- ``short``
- Platform-defined
* - `np.ushort`
- ``unsigned short``
- Platform-defined
* - `np.intc`
- ``int``
- Platform-defined
* - `np.uintc`
- ``unsigned int``
- Platform-defined
* - `np.int_`
- ``long``
- Platform-defined
* - `np.uint`
- ``unsigned long``
- Platform-defined
* - `np.longlong`
- ``long long``
- Platform-defined
* - `np.ulonglong`
- ``unsigned long long``
- Platform-defined
* - `np.half` / `np.float16`
-
- Half precision float:
sign bit, 5 bits exponent, 10 bits mantissa
* - `np.single`
- ``float``
- Platform-defined single precision float:
typically sign bit, 8 bits exponent, 23 bits mantissa
* - `np.double`
- ``double``
- Platform-defined double precision float:
typically sign bit, 11 bits exponent, 52 bits mantissa.
* - `np.longdouble`
- ``long double``
- Platform-defined extended-precision float
* - `np.csingle`
- ``float complex``
- Complex number, represented by two single-precision floats (real and imaginary components)
* - `np.cdouble`
- ``double complex``
- Complex number, represented by two double-precision floats (real and imaginary components).
* - `np.clongdouble`
- ``long double complex``
- Complex number, represented by two extended-precision floats (real and imaginary components).
Since many of these have platform-dependent definitions, a set of fixed-size
aliases are provided:
.. list-table::
:header-rows: 1
* - Numpy type
- C type
- Description
* - `np.int8`
- ``int8_t``
- Byte (-128 to 127)
* - `np.int16`
- ``int16_t``
- Integer (-32768 to 32767)
* - `np.int32`
- ``int32_t``
- Integer (-2147483648 to 2147483647)
* - `np.int64`
- ``int64_t``
- Integer (-9223372036854775808 to 9223372036854775807)
* - `np.uint8`
- ``uint8_t``
- Unsigned integer (0 to 255)
* - `np.uint16`
- ``uint16_t``
- Unsigned integer (0 to 65535)
* - `np.uint32`
- ``uint32_t``
- Unsigned integer (0 to 4294967295)
* - `np.uint64`
- ``uint64_t``
- Unsigned integer (0 to 18446744073709551615)
* - `np.intp`
- ``intptr_t``
- Integer used for indexing, typically the same as ``ssize_t``
* - `np.uintp`
- ``uintptr_t``
- Integer large enough to hold a pointer
* - `np.float32`
- ``float``
-
* - `np.float64` / `np.float_`
- ``double``
- Note that this matches the precision of the builtin python `float`.
* - `np.complex64`
- ``float complex``
- Complex number, represented by two 32-bit floats (real and imaginary components)
* - `np.complex128` / `np.complex_`
- ``double complex``
- Note that this matches the precision of the builtin python `complex`.
NumPy numerical types are instances of ``dtype`` (data-type) objects, each
having unique characteristics. Once you have imported NumPy using
::
>>> import numpy as np
the dtypes are available as ``np.bool_``, ``np.float32``, etc.
Advanced types, not listed in the table above, are explored in
section :ref:`structured_arrays`.
There are 5 basic numerical types representing booleans (bool), integers (int),
unsigned integers (uint) floating point (float) and complex. Those with numbers
in their name indicate the bitsize of the type (i.e. how many bits are needed
to represent a single value in memory). Some types, such as ``int`` and
``intp``, have differing bitsizes, dependent on the platforms (e.g. 32-bit
vs. 64-bit machines). This should be taken into account when interfacing
with low-level code (such as C or Fortran) where the raw memory is addressed.
Data-types can be used as functions to convert python numbers to array scalars
(see the array scalar section for an explanation), python sequences of numbers
to arrays of that type, or as arguments to the dtype keyword that many numpy
functions or methods accept. Some examples::
>>> import numpy as np
>>> x = np.float32(1.0)
>>> x
1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = np.arange(3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)
Array types can also be referred to by character codes, mostly to retain
backward compatibility with older packages such as Numeric. Some
documentation may still refer to these, for example::
>>> np.array([1, 2, 3], dtype='f')
array([ 1., 2., 3.], dtype=float32)
We recommend using dtype objects instead.
To convert the type of an array, use the .astype() method (preferred) or
the type itself as a function. For example: ::
>>> z.astype(float) #doctest: +NORMALIZE_WHITESPACE
array([ 0., 1., 2.])
>>> np.int8(z)
array([0, 1, 2], dtype=int8)
Note that, above, we use the *Python* float object as a dtype. NumPy knows
that ``int`` refers to ``np.int_``, ``bool`` means ``np.bool_``,
that ``float`` is ``np.float_`` and ``complex`` is ``np.complex_``.
The other data-types do not have Python equivalents.
To determine the type of an array, look at the dtype attribute::
>>> z.dtype
dtype('uint8')
dtype objects also contain information about the type, such as its bit-width
and its byte-order. The data type can also be used indirectly to query
properties of the type, such as whether it is an integer::
>>> d = np.dtype(int)
>>> d
dtype('int32')
>>> np.issubdtype(d, np.integer)
True
>>> np.issubdtype(d, np.floating)
False
Array Scalars
=============
NumPy generally returns elements of arrays as array scalars (a scalar
with an associated dtype). Array scalars differ from Python scalars, but
for the most part they can be used interchangeably (the primary
exception is for versions of Python older than v2.x, where integer array
scalars cannot act as indices for lists and tuples). There are some
exceptions, such as when code requires very specific attributes of a scalar
or when it checks specifically whether a value is a Python scalar. Generally,
problems are easily fixed by explicitly converting array scalars
to Python scalars, using the corresponding Python type function
(e.g., ``int``, ``float``, ``complex``, ``str``, ``unicode``).
The primary advantage of using array scalars is that
they preserve the array type (Python may not have a matching scalar type
available, e.g. ``int16``). Therefore, the use of array scalars ensures
identical behaviour between arrays and scalars, irrespective of whether the
value is inside an array or not. NumPy scalars also have many of the same
methods arrays do.
Overflow Errors
===============
The fixed size of NumPy numeric types may cause overflow errors when a value
requires more memory than available in the data type. For example,
`numpy.power` evaluates ``100 * 10 ** 8`` correctly for 64-bit integers,
but gives 1874919424 (incorrect) for a 32-bit integer.
>>> np.power(100, 8, dtype=np.int64)
10000000000000000
>>> np.power(100, 8, dtype=np.int32)
1874919424
The behaviour of NumPy and Python integer types differs significantly for
integer overflows and may confuse users expecting NumPy integers to behave
similar to Python's ``int``. Unlike NumPy, the size of Python's ``int`` is
flexible. This means Python integers may expand to accommodate any integer and
will not overflow.
NumPy provides `numpy.iinfo` and `numpy.finfo` to verify the
minimum or maximum values of NumPy integer and floating point values
respectively ::
>>> np.iinfo(int) # Bounds of the default integer on this system.
iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)
>>> np.iinfo(np.int32) # Bounds of a 32-bit integer
iinfo(min=-2147483648, max=2147483647, dtype=int32)
>>> np.iinfo(np.int64) # Bounds of a 64-bit integer
iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)
If 64-bit integers are still too small the result may be cast to a
floating point number. Floating point numbers offer a larger, but inexact,
range of possible values.
>>> np.power(100, 100, dtype=np.int64) # Incorrect even with 64-bit int
0
>>> np.power(100, 100, dtype=np.float64)
1e+200
Extended Precision
==================
Python's floating-point numbers are usually 64-bit floating-point numbers,
nearly equivalent to ``np.float64``. In some unusual situations it may be
useful to use floating-point numbers with more precision. Whether this
is possible in numpy depends on the hardware and on the development
environment: specifically, x86 machines provide hardware floating-point
with 80-bit precision, and while most C compilers provide this as their
``long double`` type, MSVC (standard for Windows builds) makes
``long double`` identical to ``double`` (64 bits). NumPy makes the
compiler's ``long double`` available as ``np.longdouble`` (and
``np.clongdouble`` for the complex numbers). You can find out what your
numpy provides with ``np.finfo(np.longdouble)``.
NumPy does not provide a dtype with more precision than C's
``long double``\\; in particular, the 128-bit IEEE quad precision
data type (FORTRAN's ``REAL*16``\\) is not available.
For efficient memory alignment, ``np.longdouble`` is usually stored
padded with zero bits, either to 96 or 128 bits. Which is more efficient
depends on hardware and development environment; typically on 32-bit
systems they are padded to 96 bits, while on 64-bit systems they are
typically padded to 128 bits. ``np.longdouble`` is padded to the system
default; ``np.float96`` and ``np.float128`` are provided for users who
want specific padding. In spite of the names, ``np.float96`` and
``np.float128`` provide only as much precision as ``np.longdouble``,
that is, 80 bits on most x86 machines and 64 bits in standard
Windows builds.
Be warned that even if ``np.longdouble`` offers more precision than
python ``float``, it is easy to lose that extra precision, since
python often forces values to pass through ``float``. For example,
the ``%`` formatting operator requires its arguments to be converted
to standard python types, and it is therefore impossible to preserve
extended precision even if many decimal places are requested. It can
be useful to test your code with the value
``1 + np.finfo(np.longdouble).eps``.
"""

View file

@ -0,0 +1,180 @@
"""
========================
Broadcasting over arrays
========================
.. note::
See `this article
<https://numpy.org/devdocs/user/theory.broadcasting.html>`_
for illustrations of broadcasting concepts.
The term broadcasting describes how numpy treats arrays with different
shapes during arithmetic operations. Subject to certain constraints,
the smaller array is "broadcast" across the larger array so that they
have compatible shapes. Broadcasting provides a means of vectorizing
array operations so that looping occurs in C instead of Python. It does
this without making needless copies of data and usually leads to
efficient algorithm implementations. There are, however, cases where
broadcasting is a bad idea because it leads to inefficient use of memory
that slows computation.
NumPy operations are usually done on pairs of arrays on an
element-by-element basis. In the simplest case, the two arrays must
have exactly the same shape, as in the following example:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = np.array([2.0, 2.0, 2.0])
>>> a * b
array([ 2., 4., 6.])
NumPy's broadcasting rule relaxes this constraint when the arrays'
shapes meet certain constraints. The simplest broadcasting example occurs
when an array and a scalar value are combined in an operation:
>>> a = np.array([1.0, 2.0, 3.0])
>>> b = 2.0
>>> a * b
array([ 2., 4., 6.])
The result is equivalent to the previous example where ``b`` was an array.
We can think of the scalar ``b`` being *stretched* during the arithmetic
operation into an array with the same shape as ``a``. The new elements in
``b`` are simply copies of the original scalar. The stretching analogy is
only conceptual. NumPy is smart enough to use the original scalar value
without actually making copies so that broadcasting operations are as
memory and computationally efficient as possible.
The code in the second example is more efficient than that in the first
because broadcasting moves less memory around during the multiplication
(``b`` is a scalar rather than an array).
General Broadcasting Rules
==========================
When operating on two arrays, NumPy compares their shapes element-wise.
It starts with the trailing dimensions and works its way forward. Two
dimensions are compatible when
1) they are equal, or
2) one of them is 1
If these conditions are not met, a
``ValueError: operands could not be broadcast together`` exception is
thrown, indicating that the arrays have incompatible shapes. The size of
the resulting array is the size that is not 1 along each axis of the inputs.
Arrays do not need to have the same *number* of dimensions. For example,
if you have a ``256x256x3`` array of RGB values, and you want to scale
each color in the image by a different value, you can multiply the image
by a one-dimensional array with 3 values. Lining up the sizes of the
trailing axes of these arrays according to the broadcast rules, shows that
they are compatible::
Image (3d array): 256 x 256 x 3
Scale (1d array): 3
Result (3d array): 256 x 256 x 3
When either of the dimensions compared is one, the other is
used. In other words, dimensions with size 1 are stretched or "copied"
to match the other.
In the following example, both the ``A`` and ``B`` arrays have axes with
length one that are expanded to a larger size during the broadcast
operation::
A (4d array): 8 x 1 x 6 x 1
B (3d array): 7 x 1 x 5
Result (4d array): 8 x 7 x 6 x 5
Here are some more examples::
A (2d array): 5 x 4
B (1d array): 1
Result (2d array): 5 x 4
A (2d array): 5 x 4
B (1d array): 4
Result (2d array): 5 x 4
A (3d array): 15 x 3 x 5
B (3d array): 15 x 1 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 5
Result (3d array): 15 x 3 x 5
A (3d array): 15 x 3 x 5
B (2d array): 3 x 1
Result (3d array): 15 x 3 x 5
Here are examples of shapes that do not broadcast::
A (1d array): 3
B (1d array): 4 # trailing dimensions do not match
A (2d array): 2 x 1
B (3d array): 8 x 4 x 3 # second from last dimensions mismatched
An example of broadcasting in practice::
>>> x = np.arange(4)
>>> xx = x.reshape(4,1)
>>> y = np.ones(5)
>>> z = np.ones((3,4))
>>> x.shape
(4,)
>>> y.shape
(5,)
>>> x + y
ValueError: operands could not be broadcast together with shapes (4,) (5,)
>>> xx.shape
(4, 1)
>>> y.shape
(5,)
>>> (xx + y).shape
(4, 5)
>>> xx + y
array([[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.],
[ 4., 4., 4., 4., 4.]])
>>> x.shape
(4,)
>>> z.shape
(3, 4)
>>> (x + z).shape
(3, 4)
>>> x + z
array([[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]])
Broadcasting provides a convenient way of taking the outer product (or
any other outer operation) of two arrays. The following example shows an
outer addition operation of two 1-d arrays::
>>> a = np.array([0.0, 10.0, 20.0, 30.0])
>>> b = np.array([1.0, 2.0, 3.0])
>>> a[:, np.newaxis] + b
array([[ 1., 2., 3.],
[ 11., 12., 13.],
[ 21., 22., 23.],
[ 31., 32., 33.]])
Here the ``newaxis`` index operator inserts a new axis into ``a``,
making it a two-dimensional ``4x1`` array. Combining the ``4x1`` array
with ``b``, which has shape ``(3,)``, yields a ``4x3`` array.
"""

View file

@ -0,0 +1,155 @@
"""
=============================
Byteswapping and byte order
=============================
Introduction to byte ordering and ndarrays
==========================================
The ``ndarray`` is an object that provide a python array interface to data
in memory.
It often happens that the memory that you want to view with an array is
not of the same byte ordering as the computer on which you are running
Python.
For example, I might be working on a computer with a little-endian CPU -
such as an Intel Pentium, but I have loaded some data from a file
written by a computer that is big-endian. Let's say I have loaded 4
bytes from a file written by a Sun (big-endian) computer. I know that
these 4 bytes represent two 16-bit integers. On a big-endian machine, a
two-byte integer is stored with the Most Significant Byte (MSB) first,
and then the Least Significant Byte (LSB). Thus the bytes are, in memory order:
#. MSB integer 1
#. LSB integer 1
#. MSB integer 2
#. LSB integer 2
Let's say the two integers were in fact 1 and 770. Because 770 = 256 *
3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2.
The bytes I have loaded from the file would have these contents:
>>> big_end_buffer = bytearray([0,1,3,2])
>>> big_end_buffer
bytearray(b'\\x00\\x01\\x03\\x02')
We might want to use an ``ndarray`` to access these integers. In that
case, we can create an array around this memory, and tell numpy that
there are two integers, and that they are 16 bit and big-endian:
>>> import numpy as np
>>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_buffer)
>>> big_end_arr[0]
1
>>> big_end_arr[1]
770
Note the array ``dtype`` above of ``>i2``. The ``>`` means 'big-endian'
(``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'. For
example, if our data represented a single unsigned 4-byte little-endian
integer, the dtype string would be ``<u4``.
In fact, why don't we try that?
>>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_buffer)
>>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3
True
Returning to our ``big_end_arr`` - in this case our underlying data is
big-endian (data endianness) and we've set the dtype to match (the dtype
is also big-endian). However, sometimes you need to flip these around.
.. warning::
Scalars currently do not include byte order information, so extracting
a scalar from an array will return an integer in native byte order.
Hence:
>>> big_end_arr[0].dtype.byteorder == little_end_u4[0].dtype.byteorder
True
Changing byte ordering
======================
As you can imagine from the introduction, there are two ways you can
affect the relationship between the byte ordering of the array and the
underlying memory it is looking at:
* Change the byte-ordering information in the array dtype so that it
interprets the underlying data as being in a different byte order.
This is the role of ``arr.newbyteorder()``
* Change the byte-ordering of the underlying data, leaving the dtype
interpretation as it was. This is what ``arr.byteswap()`` does.
The common situations in which you need to change byte ordering are:
#. Your data and dtype endianness don't match, and you want to change
the dtype so that it matches the data.
#. Your data and dtype endianness don't match, and you want to swap the
data so that they match the dtype
#. Your data and dtype endianness match, but you want the data swapped
and the dtype to reflect this
Data and dtype endianness don't match, change dtype to match data
-----------------------------------------------------------------
We make something where they don't match:
>>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_buffer)
>>> wrong_end_dtype_arr[0]
256
The obvious fix for this situation is to change the dtype so it gives
the correct endianness:
>>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder()
>>> fixed_end_dtype_arr[0]
1
Note the array has not changed in memory:
>>> fixed_end_dtype_arr.tobytes() == big_end_buffer
True
Data and type endianness don't match, change data to match dtype
----------------------------------------------------------------
You might want to do this if you need the data in memory to be a certain
ordering. For example you might be writing the memory out to a file
that needs a certain byte ordering.
>>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap()
>>> fixed_end_mem_arr[0]
1
Now the array *has* changed in memory:
>>> fixed_end_mem_arr.tobytes() == big_end_buffer
False
Data and dtype endianness match, swap data and dtype
----------------------------------------------------
You may have a correctly specified array dtype, but you need the array
to have the opposite byte order in memory, and you want the dtype to
match so the array values make sense. In this case you just do both of
the previous operations:
>>> swapped_end_arr = big_end_arr.byteswap().newbyteorder()
>>> swapped_end_arr[0]
1
>>> swapped_end_arr.tobytes() == big_end_buffer
False
An easier way of casting the data to a specific dtype and byte ordering
can be achieved with the ndarray astype method:
>>> swapped_end_arr = big_end_arr.astype('<i2')
>>> swapped_end_arr[0]
1
>>> swapped_end_arr.tobytes() == big_end_buffer
False
"""

View file

@ -0,0 +1,417 @@
# -*- coding: utf-8 -*-
"""
=========
Constants
=========
.. currentmodule:: numpy
NumPy includes several constants:
%(constant_list)s
"""
#
# Note: the docstring is autogenerated.
#
import re
import textwrap
# Maintain same format as in numpy.add_newdocs
constants = []
def add_newdoc(module, name, doc):
constants.append((name, doc))
add_newdoc('numpy', 'pi',
"""
``pi = 3.1415926535897932384626433...``
References
----------
https://en.wikipedia.org/wiki/Pi
""")
add_newdoc('numpy', 'e',
"""
Euler's constant, base of natural logarithms, Napier's constant.
``e = 2.71828182845904523536028747135266249775724709369995...``
See Also
--------
exp : Exponential function
log : Natural logarithm
References
----------
https://en.wikipedia.org/wiki/E_%28mathematical_constant%29
""")
add_newdoc('numpy', 'euler_gamma',
"""
``γ = 0.5772156649015328606065120900824024310421...``
References
----------
https://en.wikipedia.org/wiki/Euler-Mascheroni_constant
""")
add_newdoc('numpy', 'inf',
"""
IEEE 754 floating point representation of (positive) infinity.
Returns
-------
y : float
A floating point representation of positive infinity.
See Also
--------
isinf : Shows which elements are positive or negative infinity
isposinf : Shows which elements are positive infinity
isneginf : Shows which elements are negative infinity
isnan : Shows which elements are Not a Number
isfinite : Shows which elements are finite (not one of Not a Number,
positive infinity and negative infinity)
Notes
-----
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
Also that positive infinity is not equivalent to negative infinity. But
infinity is equivalent to positive infinity.
`Inf`, `Infinity`, `PINF` and `infty` are aliases for `inf`.
Examples
--------
>>> np.inf
inf
>>> np.array([1]) / 0.
array([ Inf])
""")
add_newdoc('numpy', 'nan',
"""
IEEE 754 floating point representation of Not a Number (NaN).
Returns
-------
y : A floating point representation of Not a Number.
See Also
--------
isnan : Shows which elements are Not a Number.
isfinite : Shows which elements are finite (not one of
Not a Number, positive infinity and negative infinity)
Notes
-----
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
`NaN` and `NAN` are aliases of `nan`.
Examples
--------
>>> np.nan
nan
>>> np.log(-1)
nan
>>> np.log([-1, 1, 2])
array([ NaN, 0. , 0.69314718])
""")
add_newdoc('numpy', 'newaxis',
"""
A convenient alias for None, useful for indexing arrays.
See Also
--------
`numpy.doc.indexing`
Examples
--------
>>> newaxis is None
True
>>> x = np.arange(3)
>>> x
array([0, 1, 2])
>>> x[:, newaxis]
array([[0],
[1],
[2]])
>>> x[:, newaxis, newaxis]
array([[[0]],
[[1]],
[[2]]])
>>> x[:, newaxis] * x
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Outer product, same as ``outer(x, y)``:
>>> y = np.arange(3, 6)
>>> x[:, newaxis] * y
array([[ 0, 0, 0],
[ 3, 4, 5],
[ 6, 8, 10]])
``x[newaxis, :]`` is equivalent to ``x[newaxis]`` and ``x[None]``:
>>> x[newaxis, :].shape
(1, 3)
>>> x[newaxis].shape
(1, 3)
>>> x[None].shape
(1, 3)
>>> x[:, newaxis].shape
(3, 1)
""")
add_newdoc('numpy', 'NZERO',
"""
IEEE 754 floating point representation of negative zero.
Returns
-------
y : float
A floating point representation of negative zero.
See Also
--------
PZERO : Defines positive zero.
isinf : Shows which elements are positive or negative infinity.
isposinf : Shows which elements are positive infinity.
isneginf : Shows which elements are negative infinity.
isnan : Shows which elements are Not a Number.
isfinite : Shows which elements are finite - not one of
Not a Number, positive infinity and negative infinity.
Notes
-----
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). Negative zero is considered to be a finite number.
Examples
--------
>>> np.NZERO
-0.0
>>> np.PZERO
0.0
>>> np.isfinite([np.NZERO])
array([ True])
>>> np.isnan([np.NZERO])
array([False])
>>> np.isinf([np.NZERO])
array([False])
""")
add_newdoc('numpy', 'PZERO',
"""
IEEE 754 floating point representation of positive zero.
Returns
-------
y : float
A floating point representation of positive zero.
See Also
--------
NZERO : Defines negative zero.
isinf : Shows which elements are positive or negative infinity.
isposinf : Shows which elements are positive infinity.
isneginf : Shows which elements are negative infinity.
isnan : Shows which elements are Not a Number.
isfinite : Shows which elements are finite - not one of
Not a Number, positive infinity and negative infinity.
Notes
-----
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). Positive zero is considered to be a finite number.
Examples
--------
>>> np.PZERO
0.0
>>> np.NZERO
-0.0
>>> np.isfinite([np.PZERO])
array([ True])
>>> np.isnan([np.PZERO])
array([False])
>>> np.isinf([np.PZERO])
array([False])
""")
add_newdoc('numpy', 'NAN',
"""
IEEE 754 floating point representation of Not a Number (NaN).
`NaN` and `NAN` are equivalent definitions of `nan`. Please use
`nan` instead of `NAN`.
See Also
--------
nan
""")
add_newdoc('numpy', 'NaN',
"""
IEEE 754 floating point representation of Not a Number (NaN).
`NaN` and `NAN` are equivalent definitions of `nan`. Please use
`nan` instead of `NaN`.
See Also
--------
nan
""")
add_newdoc('numpy', 'NINF',
"""
IEEE 754 floating point representation of negative infinity.
Returns
-------
y : float
A floating point representation of negative infinity.
See Also
--------
isinf : Shows which elements are positive or negative infinity
isposinf : Shows which elements are positive infinity
isneginf : Shows which elements are negative infinity
isnan : Shows which elements are Not a Number
isfinite : Shows which elements are finite (not one of Not a Number,
positive infinity and negative infinity)
Notes
-----
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
Also that positive infinity is not equivalent to negative infinity. But
infinity is equivalent to positive infinity.
Examples
--------
>>> np.NINF
-inf
>>> np.log(0)
-inf
""")
add_newdoc('numpy', 'PINF',
"""
IEEE 754 floating point representation of (positive) infinity.
Use `inf` because `Inf`, `Infinity`, `PINF` and `infty` are aliases for
`inf`. For more details, see `inf`.
See Also
--------
inf
""")
add_newdoc('numpy', 'infty',
"""
IEEE 754 floating point representation of (positive) infinity.
Use `inf` because `Inf`, `Infinity`, `PINF` and `infty` are aliases for
`inf`. For more details, see `inf`.
See Also
--------
inf
""")
add_newdoc('numpy', 'Inf',
"""
IEEE 754 floating point representation of (positive) infinity.
Use `inf` because `Inf`, `Infinity`, `PINF` and `infty` are aliases for
`inf`. For more details, see `inf`.
See Also
--------
inf
""")
add_newdoc('numpy', 'Infinity',
"""
IEEE 754 floating point representation of (positive) infinity.
Use `inf` because `Inf`, `Infinity`, `PINF` and `infty` are aliases for
`inf`. For more details, see `inf`.
See Also
--------
inf
""")
if __doc__:
constants_str = []
constants.sort()
for name, doc in constants:
s = textwrap.dedent(doc).replace("\n", "\n ")
# Replace sections by rubrics
lines = s.split("\n")
new_lines = []
for line in lines:
m = re.match(r'^(\s+)[-=]+\s*$', line)
if m and new_lines:
prev = textwrap.dedent(new_lines.pop())
new_lines.append('%s.. rubric:: %s' % (m.group(1), prev))
new_lines.append('')
else:
new_lines.append(line)
s = "\n".join(new_lines)
# Done.
constants_str.append(""".. data:: %s\n %s""" % (name, s))
constants_str = "\n".join(constants_str)
__doc__ = __doc__ % dict(constant_list=constants_str)
del constants_str, name, doc
del line, lines, new_lines, m, s, prev
del constants, add_newdoc

View file

@ -0,0 +1,143 @@
"""
==============
Array Creation
==============
Introduction
============
There are 5 general mechanisms for creating arrays:
1) Conversion from other Python structures (e.g., lists, tuples)
2) Intrinsic numpy array creation objects (e.g., arange, ones, zeros,
etc.)
3) Reading arrays from disk, either from standard or custom formats
4) Creating arrays from raw bytes through the use of strings or buffers
5) Use of special library functions (e.g., random)
This section will not cover means of replicating, joining, or otherwise
expanding or mutating existing arrays. Nor will it cover creating object
arrays or structured arrays. Both of those are covered in their own sections.
Converting Python array_like Objects to NumPy Arrays
====================================================
In general, numerical data arranged in an array-like structure in Python can
be converted to arrays through the use of the array() function. The most
obvious examples are lists and tuples. See the documentation for array() for
details for its use. Some objects may support the array-protocol and allow
conversion to arrays this way. A simple way to find out if the object can be
converted to a numpy array using array() is simply to try it interactively and
see if it works! (The Python Way).
Examples: ::
>>> x = np.array([2,3,1,0])
>>> x = np.array([2, 3, 1, 0])
>>> x = np.array([[1,2.0],[0,0],(1+1j,3.)]) # note mix of tuple and lists,
and types
>>> x = np.array([[ 1.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j], [ 1.+1.j, 3.+0.j]])
Intrinsic NumPy Array Creation
==============================
NumPy has built-in functions for creating arrays from scratch:
zeros(shape) will create an array filled with 0 values with the specified
shape. The default dtype is float64. ::
>>> np.zeros((2, 3))
array([[ 0., 0., 0.], [ 0., 0., 0.]])
ones(shape) will create an array filled with 1 values. It is identical to
zeros in all other respects.
arange() will create arrays with regularly incrementing values. Check the
docstring for complete information on the various ways it can be used. A few
examples will be given here: ::
>>> np.arange(10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(2, 10, dtype=float)
array([ 2., 3., 4., 5., 6., 7., 8., 9.])
>>> np.arange(2, 3, 0.1)
array([ 2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9])
Note that there are some subtleties regarding the last usage that the user
should be aware of that are described in the arange docstring.
linspace() will create arrays with a specified number of elements, and
spaced equally between the specified beginning and end values. For
example: ::
>>> np.linspace(1., 4., 6)
array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. ])
The advantage of this creation function is that one can guarantee the
number of elements and the starting and end point, which arange()
generally will not do for arbitrary start, stop, and step values.
indices() will create a set of arrays (stacked as a one-higher dimensioned
array), one per dimension with each representing variation in that dimension.
An example illustrates much better than a verbal description: ::
>>> np.indices((3,3))
array([[[0, 0, 0], [1, 1, 1], [2, 2, 2]], [[0, 1, 2], [0, 1, 2], [0, 1, 2]]])
This is particularly useful for evaluating functions of multiple dimensions on
a regular grid.
Reading Arrays From Disk
========================
This is presumably the most common case of large array creation. The details,
of course, depend greatly on the format of data on disk and so this section
can only give general pointers on how to handle various formats.
Standard Binary Formats
-----------------------
Various fields have standard formats for array data. The following lists the
ones with known python libraries to read them and return numpy arrays (there
may be others for which it is possible to read and convert to numpy arrays so
check the last section as well)
::
HDF5: h5py
FITS: Astropy
Examples of formats that cannot be read directly but for which it is not hard to
convert are those formats supported by libraries like PIL (able to read and
write many image formats such as jpg, png, etc).
Common ASCII Formats
------------------------
Comma Separated Value files (CSV) are widely used (and an export and import
option for programs like Excel). There are a number of ways of reading these
files in Python. There are CSV functions in Python and functions in pylab
(part of matplotlib).
More generic ascii files can be read using the io package in scipy.
Custom Binary Formats
---------------------
There are a variety of approaches one can use. If the file has a relatively
simple format then one can write a simple I/O library and use the numpy
fromfile() function and .tofile() method to read and write numpy arrays
directly (mind your byteorder though!) If a good C or C++ library exists that
read the data, one can wrap that library with a variety of techniques though
that certainly is much more work and requires significantly more advanced
knowledge to interface with C or C++.
Use of Special Libraries
------------------------
There are libraries that can be used to generate arrays for special purposes
and it isn't possible to enumerate all of them. The most common uses are use
of the many array generation functions in random that can generate arrays of
random values, and some utility functions to generate special matrices (e.g.
diagonal).
"""

View file

@ -0,0 +1,271 @@
""".. _dispatch_mechanism:
Numpy's dispatch mechanism, introduced in numpy version v1.16 is the
recommended approach for writing custom N-dimensional array containers that are
compatible with the numpy API and provide custom implementations of numpy
functionality. Applications include `dask <http://dask.pydata.org>`_ arrays, an
N-dimensional array distributed across multiple nodes, and `cupy
<https://docs-cupy.chainer.org/en/stable/>`_ arrays, an N-dimensional array on
a GPU.
To get a feel for writing custom array containers, we'll begin with a simple
example that has rather narrow utility but illustrates the concepts involved.
>>> import numpy as np
>>> class DiagonalArray:
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self):
... return self._i * np.eye(self._N)
...
Our custom array can be instantiated like:
>>> arr = DiagonalArray(5, 1)
>>> arr
DiagonalArray(N=5, value=1)
We can convert to a numpy array using :func:`numpy.array` or
:func:`numpy.asarray`, which will call its ``__array__`` method to obtain a
standard ``numpy.ndarray``.
>>> np.asarray(arr)
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
If we operate on ``arr`` with a numpy function, numpy will again use the
``__array__`` interface to convert it to an array and then apply the function
in the usual way.
>>> np.multiply(arr, 2)
array([[2., 0., 0., 0., 0.],
[0., 2., 0., 0., 0.],
[0., 0., 2., 0., 0.],
[0., 0., 0., 2., 0.],
[0., 0., 0., 0., 2.]])
Notice that the return type is a standard ``numpy.ndarray``.
>>> type(arr)
numpy.ndarray
How can we pass our custom array type through this function? Numpy allows a
class to indicate that it would like to handle computations in a custom-defined
way through the interfaces ``__array_ufunc__`` and ``__array_function__``. Let's
take one at a time, starting with ``_array_ufunc__``. This method covers
:ref:`ufuncs`, a class of functions that includes, for example,
:func:`numpy.multiply` and :func:`numpy.sin`.
The ``__array_ufunc__`` receives:
- ``ufunc``, a function like ``numpy.multiply``
- ``method``, a string, differentiating between ``numpy.multiply(...)`` and
variants like ``numpy.multiply.outer``, ``numpy.multiply.accumulate``, and so
on. For the common case, ``numpy.multiply(...)``, ``method == '__call__'``.
- ``inputs``, which could be a mixture of different types
- ``kwargs``, keyword arguments passed to the function
For this example we will only handle the method ``__call__``.
>>> from numbers import Number
>>> class DiagonalArray:
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self):
... return self._i * np.eye(self._N)
... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
... if method == '__call__':
... N = None
... scalars = []
... for input in inputs:
... if isinstance(input, Number):
... scalars.append(input)
... elif isinstance(input, self.__class__):
... scalars.append(input._i)
... if N is not None:
... if N != self._N:
... raise TypeError("inconsistent sizes")
... else:
... N = self._N
... else:
... return NotImplemented
... return self.__class__(N, ufunc(*scalars, **kwargs))
... else:
... return NotImplemented
...
Now our custom array type passes through numpy functions.
>>> arr = DiagonalArray(5, 1)
>>> np.multiply(arr, 3)
DiagonalArray(N=5, value=3)
>>> np.add(arr, 3)
DiagonalArray(N=5, value=4)
>>> np.sin(arr)
DiagonalArray(N=5, value=0.8414709848078965)
At this point ``arr + 3`` does not work.
>>> arr + 3
TypeError: unsupported operand type(s) for *: 'DiagonalArray' and 'int'
To support it, we need to define the Python interfaces ``__add__``, ``__lt__``,
and so on to dispatch to the corresponding ufunc. We can achieve this
conveniently by inheriting from the mixin
:class:`~numpy.lib.mixins.NDArrayOperatorsMixin`.
>>> import numpy.lib.mixins
>>> class DiagonalArray(numpy.lib.mixins.NDArrayOperatorsMixin):
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self):
... return self._i * np.eye(self._N)
... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
... if method == '__call__':
... N = None
... scalars = []
... for input in inputs:
... if isinstance(input, Number):
... scalars.append(input)
... elif isinstance(input, self.__class__):
... scalars.append(input._i)
... if N is not None:
... if N != self._N:
... raise TypeError("inconsistent sizes")
... else:
... N = self._N
... else:
... return NotImplemented
... return self.__class__(N, ufunc(*scalars, **kwargs))
... else:
... return NotImplemented
...
>>> arr = DiagonalArray(5, 1)
>>> arr + 3
DiagonalArray(N=5, value=4)
>>> arr > 0
DiagonalArray(N=5, value=True)
Now let's tackle ``__array_function__``. We'll create dict that maps numpy
functions to our custom variants.
>>> HANDLED_FUNCTIONS = {}
>>> class DiagonalArray(numpy.lib.mixins.NDArrayOperatorsMixin):
... def __init__(self, N, value):
... self._N = N
... self._i = value
... def __repr__(self):
... return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
... def __array__(self):
... return self._i * np.eye(self._N)
... def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
... if method == '__call__':
... N = None
... scalars = []
... for input in inputs:
... # In this case we accept only scalar numbers or DiagonalArrays.
... if isinstance(input, Number):
... scalars.append(input)
... elif isinstance(input, self.__class__):
... scalars.append(input._i)
... if N is not None:
... if N != self._N:
... raise TypeError("inconsistent sizes")
... else:
... N = self._N
... else:
... return NotImplemented
... return self.__class__(N, ufunc(*scalars, **kwargs))
... else:
... return NotImplemented
... def __array_function__(self, func, types, args, kwargs):
... if func not in HANDLED_FUNCTIONS:
... return NotImplemented
... # Note: this allows subclasses that don't override
... # __array_function__ to handle DiagonalArray objects.
... if not all(issubclass(t, self.__class__) for t in types):
... return NotImplemented
... return HANDLED_FUNCTIONS[func](*args, **kwargs)
...
A convenient pattern is to define a decorator ``implements`` that can be used
to add functions to ``HANDLED_FUNCTIONS``.
>>> def implements(np_function):
... "Register an __array_function__ implementation for DiagonalArray objects."
... def decorator(func):
... HANDLED_FUNCTIONS[np_function] = func
... return func
... return decorator
...
Now we write implementations of numpy functions for ``DiagonalArray``.
For completeness, to support the usage ``arr.sum()`` add a method ``sum`` that
calls ``numpy.sum(self)``, and the same for ``mean``.
>>> @implements(np.sum)
... def sum(arr):
... "Implementation of np.sum for DiagonalArray objects"
... return arr._i * arr._N
...
>>> @implements(np.mean)
... def mean(arr):
... "Implementation of np.mean for DiagonalArray objects"
... return arr._i / arr._N
...
>>> arr = DiagonalArray(5, 1)
>>> np.sum(arr)
5
>>> np.mean(arr)
0.2
If the user tries to use any numpy functions not included in
``HANDLED_FUNCTIONS``, a ``TypeError`` will be raised by numpy, indicating that
this operation is not supported. For example, concatenating two
``DiagonalArrays`` does not produce another diagonal array, so it is not
supported.
>>> np.concatenate([arr, arr])
TypeError: no implementation found for 'numpy.concatenate' on types that implement __array_function__: [<class '__main__.DiagonalArray'>]
Additionally, our implementations of ``sum`` and ``mean`` do not accept the
optional arguments that numpy's implementation does.
>>> np.sum(arr, axis=0)
TypeError: sum() got an unexpected keyword argument 'axis'
The user always has the option of converting to a normal ``numpy.ndarray`` with
:func:`numpy.asarray` and using standard numpy from there.
>>> np.concatenate([np.asarray(arr), np.asarray(arr)])
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.],
[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Refer to the `dask source code <https://github.com/dask/dask>`_ and
`cupy source code <https://github.com/cupy/cupy>`_ for more fully-worked
examples of custom array containers.
See also `NEP 18 <http://www.numpy.org/neps/nep-0018-array-function-protocol.html>`_.
"""

View file

@ -0,0 +1,475 @@
"""
========
Glossary
========
.. glossary::
along an axis
Axes are defined for arrays with more than one dimension. A
2-dimensional array has two corresponding axes: the first running
vertically downwards across rows (axis 0), and the second running
horizontally across columns (axis 1).
Many operations can take place along one of these axes. For example,
we can sum each row of an array, in which case we operate along
columns, or axis 1::
>>> x = np.arange(12).reshape((3,4))
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> x.sum(axis=1)
array([ 6, 22, 38])
array
A homogeneous container of numerical elements. Each element in the
array occupies a fixed amount of memory (hence homogeneous), and
can be a numerical element of a single type (such as float, int
or complex) or a combination (such as ``(float, int, float)``). Each
array has an associated data-type (or ``dtype``), which describes
the numerical type of its elements::
>>> x = np.array([1, 2, 3], float)
>>> x
array([ 1., 2., 3.])
>>> x.dtype # floating point number, 64 bits of memory per element
dtype('float64')
# More complicated data type: each array element is a combination of
# and integer and a floating point number
>>> np.array([(1, 2.0), (3, 4.0)], dtype=[('x', int), ('y', float)])
array([(1, 2.0), (3, 4.0)],
dtype=[('x', '<i4'), ('y', '<f8')])
Fast element-wise operations, called a :term:`ufunc`, operate on arrays.
array_like
Any sequence that can be interpreted as an ndarray. This includes
nested lists, tuples, scalars and existing arrays.
attribute
A property of an object that can be accessed using ``obj.attribute``,
e.g., ``shape`` is an attribute of an array::
>>> x = np.array([1, 2, 3])
>>> x.shape
(3,)
big-endian
When storing a multi-byte value in memory as a sequence of bytes, the
sequence addresses/sends/stores the most significant byte first (lowest
address) and the least significant byte last (highest address). Common in
micro-processors and used for transmission of data over network protocols.
BLAS
`Basic Linear Algebra Subprograms <https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms>`_
broadcast
NumPy can do operations on arrays whose shapes are mismatched::
>>> x = np.array([1, 2])
>>> y = np.array([[3], [4]])
>>> x
array([1, 2])
>>> y
array([[3],
[4]])
>>> x + y
array([[4, 5],
[5, 6]])
See `numpy.doc.broadcasting` for more information.
C order
See `row-major`
column-major
A way to represent items in a N-dimensional array in the 1-dimensional
computer memory. In column-major order, the leftmost index "varies the
fastest": for example the array::
[[1, 2, 3],
[4, 5, 6]]
is represented in the column-major order as::
[1, 4, 2, 5, 3, 6]
Column-major order is also known as the Fortran order, as the Fortran
programming language uses it.
decorator
An operator that transforms a function. For example, a ``log``
decorator may be defined to print debugging information upon
function execution::
>>> def log(f):
... def new_logging_func(*args, **kwargs):
... print("Logging call with parameters:", args, kwargs)
... return f(*args, **kwargs)
...
... return new_logging_func
Now, when we define a function, we can "decorate" it using ``log``::
>>> @log
... def add(a, b):
... return a + b
Calling ``add`` then yields:
>>> add(1, 2)
Logging call with parameters: (1, 2) {}
3
dictionary
Resembling a language dictionary, which provides a mapping between
words and descriptions thereof, a Python dictionary is a mapping
between two objects::
>>> x = {1: 'one', 'two': [1, 2]}
Here, `x` is a dictionary mapping keys to values, in this case
the integer 1 to the string "one", and the string "two" to
the list ``[1, 2]``. The values may be accessed using their
corresponding keys::
>>> x[1]
'one'
>>> x['two']
[1, 2]
Note that dictionaries are not stored in any specific order. Also,
most mutable (see *immutable* below) objects, such as lists, may not
be used as keys.
For more information on dictionaries, read the
`Python tutorial <https://docs.python.org/tutorial/>`_.
field
In a :term:`structured data type`, each sub-type is called a `field`.
The `field` has a name (a string), a type (any valid dtype), and
an optional `title`. See :ref:`arrays.dtypes`
Fortran order
See `column-major`
flattened
Collapsed to a one-dimensional array. See `numpy.ndarray.flatten`
for details.
homogenous
Describes a block of memory comprised of blocks, each block comprised of
items and of the same size, and blocks are interpreted in exactly the
same way. In the simplest case each block contains a single item, for
instance int32 or float64.
immutable
An object that cannot be modified after execution is called
immutable. Two common examples are strings and tuples.
instance
A class definition gives the blueprint for constructing an object::
>>> class House:
... wall_colour = 'white'
Yet, we have to *build* a house before it exists::
>>> h = House() # build a house
Now, ``h`` is called a ``House`` instance. An instance is therefore
a specific realisation of a class.
iterable
A sequence that allows "walking" (iterating) over items, typically
using a loop such as::
>>> x = [1, 2, 3]
>>> [item**2 for item in x]
[1, 4, 9]
It is often used in combination with ``enumerate``::
>>> keys = ['a','b','c']
>>> for n, k in enumerate(keys):
... print("Key %d: %s" % (n, k))
...
Key 0: a
Key 1: b
Key 2: c
itemsize
The size of the dtype element in bytes.
list
A Python container that can hold any number of objects or items.
The items do not have to be of the same type, and can even be
lists themselves::
>>> x = [2, 2.0, "two", [2, 2.0]]
The list `x` contains 4 items, each which can be accessed individually::
>>> x[2] # the string 'two'
'two'
>>> x[3] # a list, containing an integer 2 and a float 2.0
[2, 2.0]
It is also possible to select more than one item at a time,
using *slicing*::
>>> x[0:2] # or, equivalently, x[:2]
[2, 2.0]
In code, arrays are often conveniently expressed as nested lists::
>>> np.array([[1, 2], [3, 4]])
array([[1, 2],
[3, 4]])
For more information, read the section on lists in the `Python
tutorial <https://docs.python.org/tutorial/>`_. For a mapping
type (key-value), see *dictionary*.
little-endian
When storing a multi-byte value in memory as a sequence of bytes, the
sequence addresses/sends/stores the least significant byte first (lowest
address) and the most significant byte last (highest address). Common in
x86 processors.
mask
A boolean array, used to select only certain elements for an operation::
>>> x = np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> mask = (x > 2)
>>> mask
array([False, False, False, True, True])
>>> x[mask] = -1
>>> x
array([ 0, 1, 2, -1, -1])
masked array
Array that suppressed values indicated by a mask::
>>> x = np.ma.masked_array([np.nan, 2, np.nan], [True, False, True])
>>> x
masked_array(data = [-- 2.0 --],
mask = [ True False True],
fill_value = 1e+20)
>>> x + [1, 2, 3]
masked_array(data = [-- 4.0 --],
mask = [ True False True],
fill_value = 1e+20)
Masked arrays are often used when operating on arrays containing
missing or invalid entries.
matrix
A 2-dimensional ndarray that preserves its two-dimensional nature
throughout operations. It has certain special operations, such as ``*``
(matrix multiplication) and ``**`` (matrix power), defined::
>>> x = np.mat([[1, 2], [3, 4]])
>>> x
matrix([[1, 2],
[3, 4]])
>>> x**2
matrix([[ 7, 10],
[15, 22]])
method
A function associated with an object. For example, each ndarray has a
method called ``repeat``::
>>> x = np.array([1, 2, 3])
>>> x.repeat(2)
array([1, 1, 2, 2, 3, 3])
ndarray
See *array*.
record array
An :term:`ndarray` with :term:`structured data type` which has been
subclassed as ``np.recarray`` and whose dtype is of type ``np.record``,
making the fields of its data type to be accessible by attribute.
reference
If ``a`` is a reference to ``b``, then ``(a is b) == True``. Therefore,
``a`` and ``b`` are different names for the same Python object.
row-major
A way to represent items in a N-dimensional array in the 1-dimensional
computer memory. In row-major order, the rightmost index "varies
the fastest": for example the array::
[[1, 2, 3],
[4, 5, 6]]
is represented in the row-major order as::
[1, 2, 3, 4, 5, 6]
Row-major order is also known as the C order, as the C programming
language uses it. New NumPy arrays are by default in row-major order.
self
Often seen in method signatures, ``self`` refers to the instance
of the associated class. For example:
>>> class Paintbrush:
... color = 'blue'
...
... def paint(self):
... print("Painting the city %s!" % self.color)
...
>>> p = Paintbrush()
>>> p.color = 'red'
>>> p.paint() # self refers to 'p'
Painting the city red!
slice
Used to select only certain elements from a sequence:
>>> x = range(5)
>>> x
[0, 1, 2, 3, 4]
>>> x[1:3] # slice from 1 to 3 (excluding 3 itself)
[1, 2]
>>> x[1:5:2] # slice from 1 to 5, but skipping every second element
[1, 3]
>>> x[::-1] # slice a sequence in reverse
[4, 3, 2, 1, 0]
Arrays may have more than one dimension, each which can be sliced
individually:
>>> x = np.array([[1, 2], [3, 4]])
>>> x
array([[1, 2],
[3, 4]])
>>> x[:, 1]
array([2, 4])
structure
See :term:`structured data type`
structured data type
A data type composed of other datatypes
subarray data type
A :term:`structured data type` may contain a :term:`ndarray` with its
own dtype and shape:
>>> dt = np.dtype([('a', np.int32), ('b', np.float32, (3,))])
>>> np.zeros(3, dtype=dt)
array([(0, [0., 0., 0.]), (0, [0., 0., 0.]), (0, [0., 0., 0.])],
dtype=[('a', '<i4'), ('b', '<f4', (3,))])
title
In addition to field names, structured array fields may have an
associated :ref:`title <titles>` which is an alias to the name and is
commonly used for plotting.
tuple
A sequence that may contain a variable number of types of any
kind. A tuple is immutable, i.e., once constructed it cannot be
changed. Similar to a list, it can be indexed and sliced::
>>> x = (1, 'one', [1, 2])
>>> x
(1, 'one', [1, 2])
>>> x[0]
1
>>> x[:2]
(1, 'one')
A useful concept is "tuple unpacking", which allows variables to
be assigned to the contents of a tuple::
>>> x, y = (1, 2)
>>> x, y = 1, 2
This is often used when a function returns multiple values:
>>> def return_many():
... return 1, 'alpha', None
>>> a, b, c = return_many()
>>> a, b, c
(1, 'alpha', None)
>>> a
1
>>> b
'alpha'
ufunc
Universal function. A fast element-wise, :term:`vectorized
<vectorization>` array operation. Examples include ``add``, ``sin`` and
``logical_or``.
vectorization
Optimizing a looping block by specialized code. In a traditional sense,
vectorization performs the same operation on multiple elements with
fixed strides between them via specialized hardware. Compilers know how
to take advantage of well-constructed loops to implement such
optimizations. NumPy uses :ref:`vectorization <whatis-vectorization>`
to mean any optimization via specialized code performing the same
operations on multiple elements, typically achieving speedups by
avoiding some of the overhead in looking up and converting the elements.
view
An array that does not own its data, but refers to another array's
data instead. For example, we may create a view that only shows
every second element of another array::
>>> x = np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> y = x[::2]
>>> y
array([0, 2, 4])
>>> x[0] = 3 # changing x changes y as well, since y is a view on x
>>> y
array([3, 2, 4])
wrapper
Python is a high-level (highly abstracted, or English-like) language.
This abstraction comes at a price in execution speed, and sometimes
it becomes necessary to use lower level languages to do fast
computations. A wrapper is code that provides a bridge between
high and the low level languages, allowing, e.g., Python to execute
code written in C or Fortran.
Examples include ctypes, SWIG and Cython (which wraps C and C++)
and f2py (which wraps Fortran).
"""

View file

@ -0,0 +1,456 @@
"""
==============
Array indexing
==============
Array indexing refers to any use of the square brackets ([]) to index
array values. There are many options to indexing, which give numpy
indexing great power, but with power comes some complexity and the
potential for confusion. This section is just an overview of the
various options and issues related to indexing. Aside from single
element indexing, the details on most of these options are to be
found in related sections.
Assignment vs referencing
=========================
Most of the following examples show the use of indexing when
referencing data in an array. The examples work just as well
when assigning to an array. See the section at the end for
specific examples and explanations on how assignments work.
Single element indexing
=======================
Single element indexing for a 1-D array is what one expects. It work
exactly like that for other standard Python sequences. It is 0-based,
and accepts negative indices for indexing from the end of the array. ::
>>> x = np.arange(10)
>>> x[2]
2
>>> x[-2]
8
Unlike lists and tuples, numpy arrays support multidimensional indexing
for multidimensional arrays. That means that it is not necessary to
separate each dimension's index into its own set of square brackets. ::
>>> x.shape = (2,5) # now x is 2-dimensional
>>> x[1,3]
8
>>> x[1,-1]
9
Note that if one indexes a multidimensional array with fewer indices
than dimensions, one gets a subdimensional array. For example: ::
>>> x[0]
array([0, 1, 2, 3, 4])
That is, each index specified selects the array corresponding to the
rest of the dimensions selected. In the above example, choosing 0
means that the remaining dimension of length 5 is being left unspecified,
and that what is returned is an array of that dimensionality and size.
It must be noted that the returned array is not a copy of the original,
but points to the same values in memory as does the original array.
In this case, the 1-D array at the first position (0) is returned.
So using a single index on the returned array, results in a single
element being returned. That is: ::
>>> x[0][2]
2
So note that ``x[0,2] = x[0][2]`` though the second case is more
inefficient as a new temporary array is created after the first index
that is subsequently indexed by 2.
Note to those used to IDL or Fortran memory order as it relates to
indexing. NumPy uses C-order indexing. That means that the last
index usually represents the most rapidly changing memory location,
unlike Fortran or IDL, where the first index represents the most
rapidly changing location in memory. This difference represents a
great potential for confusion.
Other indexing options
======================
It is possible to slice and stride arrays to extract arrays of the
same number of dimensions, but of different sizes than the original.
The slicing and striding works exactly the same way it does for lists
and tuples except that they can be applied to multiple dimensions as
well. A few examples illustrates best: ::
>>> x = np.arange(10)
>>> x[2:5]
array([2, 3, 4])
>>> x[:-7]
array([0, 1, 2])
>>> x[1:7:2]
array([1, 3, 5])
>>> y = np.arange(35).reshape(5,7)
>>> y[1:5:2,::3]
array([[ 7, 10, 13],
[21, 24, 27]])
Note that slices of arrays do not copy the internal array data but
only produce new views of the original data. This is different from
list or tuple slicing and an explicit ``copy()`` is recommended if
the original data is not required anymore.
It is possible to index arrays with other arrays for the purposes of
selecting lists of values out of arrays into new arrays. There are
two different ways of accomplishing this. One uses one or more arrays
of index values. The other involves giving a boolean array of the proper
shape to indicate the values to be selected. Index arrays are a very
powerful tool that allow one to avoid looping over individual elements in
arrays and thus greatly improve performance.
It is possible to use special features to effectively increase the
number of dimensions in an array through indexing so the resulting
array acquires the shape needed for use in an expression or with a
specific function.
Index arrays
============
NumPy arrays may be indexed with other arrays (or any other sequence-
like object that can be converted to an array, such as lists, with the
exception of tuples; see the end of this document for why this is). The
use of index arrays ranges from simple, straightforward cases to
complex, hard-to-understand cases. For all cases of index arrays, what
is returned is a copy of the original data, not a view as one gets for
slices.
Index arrays must be of integer type. Each value in the array indicates
which value in the array to use in place of the index. To illustrate: ::
>>> x = np.arange(10,1,-1)
>>> x
array([10, 9, 8, 7, 6, 5, 4, 3, 2])
>>> x[np.array([3, 3, 1, 8])]
array([7, 7, 9, 2])
The index array consisting of the values 3, 3, 1 and 8 correspondingly
create an array of length 4 (same as the index array) where each index
is replaced by the value the index array has in the array being indexed.
Negative values are permitted and work as they do with single indices
or slices: ::
>>> x[np.array([3,3,-3,8])]
array([7, 7, 4, 2])
It is an error to have index values out of bounds: ::
>>> x[np.array([3, 3, 20, 8])]
<type 'exceptions.IndexError'>: index 20 out of bounds 0<=index<9
Generally speaking, what is returned when index arrays are used is
an array with the same shape as the index array, but with the type
and values of the array being indexed. As an example, we can use a
multidimensional index array instead: ::
>>> x[np.array([[1,1],[2,3]])]
array([[9, 9],
[8, 7]])
Indexing Multi-dimensional arrays
=================================
Things become more complex when multidimensional arrays are indexed,
particularly with multidimensional index arrays. These tend to be
more unusual uses, but they are permitted, and they are useful for some
problems. We'll start with the simplest multidimensional case (using
the array y from the previous examples): ::
>>> y[np.array([0,2,4]), np.array([0,1,2])]
array([ 0, 15, 30])
In this case, if the index arrays have a matching shape, and there is
an index array for each dimension of the array being indexed, the
resultant array has the same shape as the index arrays, and the values
correspond to the index set for each position in the index arrays. In
this example, the first index value is 0 for both index arrays, and
thus the first value of the resultant array is y[0,0]. The next value
is y[2,1], and the last is y[4,2].
If the index arrays do not have the same shape, there is an attempt to
broadcast them to the same shape. If they cannot be broadcast to the
same shape, an exception is raised: ::
>>> y[np.array([0,2,4]), np.array([0,1])]
<type 'exceptions.ValueError'>: shape mismatch: objects cannot be
broadcast to a single shape
The broadcasting mechanism permits index arrays to be combined with
scalars for other indices. The effect is that the scalar value is used
for all the corresponding values of the index arrays: ::
>>> y[np.array([0,2,4]), 1]
array([ 1, 15, 29])
Jumping to the next level of complexity, it is possible to only
partially index an array with index arrays. It takes a bit of thought
to understand what happens in such cases. For example if we just use
one index array with y: ::
>>> y[np.array([0,2,4])]
array([[ 0, 1, 2, 3, 4, 5, 6],
[14, 15, 16, 17, 18, 19, 20],
[28, 29, 30, 31, 32, 33, 34]])
What results is the construction of a new array where each value of
the index array selects one row from the array being indexed and the
resultant array has the resulting shape (number of index elements,
size of row).
An example of where this may be useful is for a color lookup table
where we want to map the values of an image into RGB triples for
display. The lookup table could have a shape (nlookup, 3). Indexing
such an array with an image with shape (ny, nx) with dtype=np.uint8
(or any integer type so long as values are with the bounds of the
lookup table) will result in an array of shape (ny, nx, 3) where a
triple of RGB values is associated with each pixel location.
In general, the shape of the resultant array will be the concatenation
of the shape of the index array (or the shape that all the index arrays
were broadcast to) with the shape of any unused dimensions (those not
indexed) in the array being indexed.
Boolean or "mask" index arrays
==============================
Boolean arrays used as indices are treated in a different manner
entirely than index arrays. Boolean arrays must be of the same shape
as the initial dimensions of the array being indexed. In the
most straightforward case, the boolean array has the same shape: ::
>>> b = y>20
>>> y[b]
array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
Unlike in the case of integer index arrays, in the boolean case, the
result is a 1-D array containing all the elements in the indexed array
corresponding to all the true elements in the boolean array. The
elements in the indexed array are always iterated and returned in
:term:`row-major` (C-style) order. The result is also identical to
``y[np.nonzero(b)]``. As with index arrays, what is returned is a copy
of the data, not a view as one gets with slices.
The result will be multidimensional if y has more dimensions than b.
For example: ::
>>> b[:,5] # use a 1-D boolean whose first dim agrees with the first dim of y
array([False, False, False, True, True])
>>> y[b[:,5]]
array([[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34]])
Here the 4th and 5th rows are selected from the indexed array and
combined to make a 2-D array.
In general, when the boolean array has fewer dimensions than the array
being indexed, this is equivalent to y[b, ...], which means
y is indexed by b followed by as many : as are needed to fill
out the rank of y.
Thus the shape of the result is one dimension containing the number
of True elements of the boolean array, followed by the remaining
dimensions of the array being indexed.
For example, using a 2-D boolean array of shape (2,3)
with four True elements to select rows from a 3-D array of shape
(2,3,5) results in a 2-D result of shape (4,5): ::
>>> x = np.arange(30).reshape(2,3,5)
>>> x
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
>>> b = np.array([[True, True, False], [False, True, True]])
>>> x[b]
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
For further details, consult the numpy reference documentation on array indexing.
Combining index arrays with slices
==================================
Index arrays may be combined with slices. For example: ::
>>> y[np.array([0, 2, 4]), 1:3]
array([[ 1, 2],
[15, 16],
[29, 30]])
In effect, the slice and index array operation are independent.
The slice operation extracts columns with index 1 and 2,
(i.e. the 2nd and 3rd columns),
followed by the index array operation which extracts rows with
index 0, 2 and 4 (i.e the first, third and fifth rows).
This is equivalent to::
>>> y[:, 1:3][np.array([0, 2, 4]), :]
array([[ 1, 2],
[15, 16],
[29, 30]])
Likewise, slicing can be combined with broadcasted boolean indices: ::
>>> b = y > 20
>>> b
array([[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[False, False, False, False, False, False, False],
[ True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True]])
>>> y[b[:,5],1:3]
array([[22, 23],
[29, 30]])
Structural indexing tools
=========================
To facilitate easy matching of array shapes with expressions and in
assignments, the np.newaxis object can be used within array indices
to add new dimensions with a size of 1. For example: ::
>>> y.shape
(5, 7)
>>> y[:,np.newaxis,:].shape
(5, 1, 7)
Note that there are no new elements in the array, just that the
dimensionality is increased. This can be handy to combine two
arrays in a way that otherwise would require explicitly reshaping
operations. For example: ::
>>> x = np.arange(5)
>>> x[:,np.newaxis] + x[np.newaxis,:]
array([[0, 1, 2, 3, 4],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
The ellipsis syntax maybe used to indicate selecting in full any
remaining unspecified dimensions. For example: ::
>>> z = np.arange(81).reshape(3,3,3,3)
>>> z[1,...,2]
array([[29, 32, 35],
[38, 41, 44],
[47, 50, 53]])
This is equivalent to: ::
>>> z[1,:,:,2]
array([[29, 32, 35],
[38, 41, 44],
[47, 50, 53]])
Assigning values to indexed arrays
==================================
As mentioned, one can select a subset of an array to assign to using
a single index, slices, and index and mask arrays. The value being
assigned to the indexed array must be shape consistent (the same shape
or broadcastable to the shape the index produces). For example, it is
permitted to assign a constant to a slice: ::
>>> x = np.arange(10)
>>> x[2:7] = 1
or an array of the right size: ::
>>> x[2:7] = np.arange(5)
Note that assignments may result in changes if assigning
higher types to lower types (like floats to ints) or even
exceptions (assigning complex to floats or ints): ::
>>> x[1] = 1.2
>>> x[1]
1
>>> x[1] = 1.2j
TypeError: can't convert complex to int
Unlike some of the references (such as array and mask indices)
assignments are always made to the original data in the array
(indeed, nothing else would make sense!). Note though, that some
actions may not work as one may naively expect. This particular
example is often surprising to people: ::
>>> x = np.arange(0, 50, 10)
>>> x
array([ 0, 10, 20, 30, 40])
>>> x[np.array([1, 1, 3, 1])] += 1
>>> x
array([ 0, 11, 20, 31, 40])
Where people expect that the 1st location will be incremented by 3.
In fact, it will only be incremented by 1. The reason is because
a new array is extracted from the original (as a temporary) containing
the values at 1, 1, 3, 1, then the value 1 is added to the temporary,
and then the temporary is assigned back to the original array. Thus
the value of the array at x[1]+1 is assigned to x[1] three times,
rather than being incremented 3 times.
Dealing with variable numbers of indices within programs
========================================================
The index syntax is very powerful but limiting when dealing with
a variable number of indices. For example, if you want to write
a function that can handle arguments with various numbers of
dimensions without having to write special case code for each
number of possible dimensions, how can that be done? If one
supplies to the index a tuple, the tuple will be interpreted
as a list of indices. For example (using the previous definition
for the array z): ::
>>> indices = (1,1,1,1)
>>> z[indices]
40
So one can use code to construct tuples of any number of indices
and then use these within an index.
Slices can be specified within programs by using the slice() function
in Python. For example: ::
>>> indices = (1,1,1,slice(0,2)) # same as [1,1,1,0:2]
>>> z[indices]
array([39, 40])
Likewise, ellipsis can be specified by code by using the Ellipsis
object: ::
>>> indices = (1, Ellipsis, 1) # same as [1,...,1]
>>> z[indices]
array([[28, 31, 34],
[37, 40, 43],
[46, 49, 52]])
For this reason it is possible to use the output from the np.nonzero()
function directly as an index since it always returns a tuple of index
arrays.
Because the special treatment of tuples, they are not automatically
converted to an array as a list would be. As an example: ::
>>> z[[1,1,1,1]] # produces a large array
array([[[[27, 28, 29],
[30, 31, 32], ...
>>> z[(1,1,1,1)] # returns a single value
40
"""

View file

@ -0,0 +1,162 @@
"""
===============
Array Internals
===============
Internal organization of numpy arrays
=====================================
It helps to understand a bit about how numpy arrays are handled under the covers to help understand numpy better. This section will not go into great detail. Those wishing to understand the full details are referred to Travis Oliphant's book "Guide to NumPy".
NumPy arrays consist of two major components, the raw array data (from now on,
referred to as the data buffer), and the information about the raw array data.
The data buffer is typically what people think of as arrays in C or Fortran,
a contiguous (and fixed) block of memory containing fixed sized data items.
NumPy also contains a significant set of data that describes how to interpret
the data in the data buffer. This extra information contains (among other things):
1) The basic data element's size in bytes
2) The start of the data within the data buffer (an offset relative to the
beginning of the data buffer).
3) The number of dimensions and the size of each dimension
4) The separation between elements for each dimension (the 'stride'). This
does not have to be a multiple of the element size
5) The byte order of the data (which may not be the native byte order)
6) Whether the buffer is read-only
7) Information (via the dtype object) about the interpretation of the basic
data element. The basic data element may be as simple as a int or a float,
or it may be a compound object (e.g., struct-like), a fixed character field,
or Python object pointers.
8) Whether the array is to interpreted as C-order or Fortran-order.
This arrangement allow for very flexible use of arrays. One thing that it allows
is simple changes of the metadata to change the interpretation of the array buffer.
Changing the byteorder of the array is a simple change involving no rearrangement
of the data. The shape of the array can be changed very easily without changing
anything in the data buffer or any data copying at all
Among other things that are made possible is one can create a new array metadata
object that uses the same data buffer
to create a new view of that data buffer that has a different interpretation
of the buffer (e.g., different shape, offset, byte order, strides, etc) but
shares the same data bytes. Many operations in numpy do just this such as
slices. Other operations, such as transpose, don't move data elements
around in the array, but rather change the information about the shape and strides so that the indexing of the array changes, but the data in the doesn't move.
Typically these new versions of the array metadata but the same data buffer are
new 'views' into the data buffer. There is a different ndarray object, but it
uses the same data buffer. This is why it is necessary to force copies through
use of the .copy() method if one really wants to make a new and independent
copy of the data buffer.
New views into arrays mean the object reference counts for the data buffer
increase. Simply doing away with the original array object will not remove the
data buffer if other views of it still exist.
Multidimensional Array Indexing Order Issues
============================================
What is the right way to index
multi-dimensional arrays? Before you jump to conclusions about the one and
true way to index multi-dimensional arrays, it pays to understand why this is
a confusing issue. This section will try to explain in detail how numpy
indexing works and why we adopt the convention we do for images, and when it
may be appropriate to adopt other conventions.
The first thing to understand is
that there are two conflicting conventions for indexing 2-dimensional arrays.
Matrix notation uses the first index to indicate which row is being selected and
the second index to indicate which column is selected. This is opposite the
geometrically oriented-convention for images where people generally think the
first index represents x position (i.e., column) and the second represents y
position (i.e., row). This alone is the source of much confusion;
matrix-oriented users and image-oriented users expect two different things with
regard to indexing.
The second issue to understand is how indices correspond
to the order the array is stored in memory. In Fortran the first index is the
most rapidly varying index when moving through the elements of a two
dimensional array as it is stored in memory. If you adopt the matrix
convention for indexing, then this means the matrix is stored one column at a
time (since the first index moves to the next row as it changes). Thus Fortran
is considered a Column-major language. C has just the opposite convention. In
C, the last index changes most rapidly as one moves through the array as
stored in memory. Thus C is a Row-major language. The matrix is stored by
rows. Note that in both cases it presumes that the matrix convention for
indexing is being used, i.e., for both Fortran and C, the first index is the
row. Note this convention implies that the indexing convention is invariant
and that the data order changes to keep that so.
But that's not the only way
to look at it. Suppose one has large two-dimensional arrays (images or
matrices) stored in data files. Suppose the data are stored by rows rather than
by columns. If we are to preserve our index convention (whether matrix or
image) that means that depending on the language we use, we may be forced to
reorder the data if it is read into memory to preserve our indexing
convention. For example if we read row-ordered data into memory without
reordering, it will match the matrix indexing convention for C, but not for
Fortran. Conversely, it will match the image indexing convention for Fortran,
but not for C. For C, if one is using data stored in row order, and one wants
to preserve the image index convention, the data must be reordered when
reading into memory.
In the end, which you do for Fortran or C depends on
which is more important, not reordering data or preserving the indexing
convention. For large images, reordering data is potentially expensive, and
often the indexing convention is inverted to avoid that.
The situation with
numpy makes this issue yet more complicated. The internal machinery of numpy
arrays is flexible enough to accept any ordering of indices. One can simply
reorder indices by manipulating the internal stride information for arrays
without reordering the data at all. NumPy will know how to map the new index
order to the data without moving the data.
So if this is true, why not choose
the index order that matches what you most expect? In particular, why not define
row-ordered images to use the image convention? (This is sometimes referred
to as the Fortran convention vs the C convention, thus the 'C' and 'FORTRAN'
order options for array ordering in numpy.) The drawback of doing this is
potential performance penalties. It's common to access the data sequentially,
either implicitly in array operations or explicitly by looping over rows of an
image. When that is done, then the data will be accessed in non-optimal order.
As the first index is incremented, what is actually happening is that elements
spaced far apart in memory are being sequentially accessed, with usually poor
memory access speeds. For example, for a two dimensional image 'im' defined so
that im[0, 10] represents the value at x=0, y=10. To be consistent with usual
Python behavior then im[0] would represent a column at x=0. Yet that data
would be spread over the whole array since the data are stored in row order.
Despite the flexibility of numpy's indexing, it can't really paper over the fact
basic operations are rendered inefficient because of data order or that getting
contiguous subarrays is still awkward (e.g., im[:,0] for the first row, vs
im[0]), thus one can't use an idiom such as for row in im; for col in im does
work, but doesn't yield contiguous column data.
As it turns out, numpy is
smart enough when dealing with ufuncs to determine which index is the most
rapidly varying one in memory and uses that for the innermost loop. Thus for
ufuncs there is no large intrinsic advantage to either approach in most cases.
On the other hand, use of .flat with an FORTRAN ordered array will lead to
non-optimal memory access as adjacent elements in the flattened array (iterator,
actually) are not contiguous in memory.
Indeed, the fact is that Python
indexing on lists and other sequences naturally leads to an outside-to inside
ordering (the first index gets the largest grouping, the next the next largest,
and the last gets the smallest element). Since image data are normally stored
by rows, this corresponds to position within rows being the last item indexed.
If you do want to use Fortran ordering realize that
there are two approaches to consider: 1) accept that the first index is just not
the most rapidly changing in memory and have all your I/O routines reorder
your data when going from memory to disk or visa versa, or use numpy's
mechanism for mapping the first index to the most rapidly varying data. We
recommend the former if possible. The disadvantage of the latter is that many
of numpy's functions will yield arrays without Fortran ordering unless you are
careful to use the 'order' keyword. Doing this would be highly inconvenient.
Otherwise we recommend simply learning to reverse the usual order of indices
when accessing elements of an array. Granted, it goes against the grain, but
it is more in line with Python semantics and the natural order of the data.
"""

View file

@ -0,0 +1,226 @@
"""
=============
Miscellaneous
=============
IEEE 754 Floating Point Special Values
--------------------------------------
Special values defined in numpy: nan, inf,
NaNs can be used as a poor-man's mask (if you don't care what the
original value was)
Note: cannot use equality to test NaNs. E.g.: ::
>>> myarr = np.array([1., 0., np.nan, 3.])
>>> np.nonzero(myarr == np.nan)
(array([], dtype=int64),)
>>> np.nan == np.nan # is always False! Use special numpy functions instead.
False
>>> myarr[myarr == np.nan] = 0. # doesn't work
>>> myarr
array([ 1., 0., NaN, 3.])
>>> myarr[np.isnan(myarr)] = 0. # use this instead find
>>> myarr
array([ 1., 0., 0., 3.])
Other related special value functions: ::
isinf(): True if value is inf
isfinite(): True if not nan or inf
nan_to_num(): Map nan to 0, inf to max float, -inf to min float
The following corresponds to the usual functions except that nans are excluded
from the results: ::
nansum()
nanmax()
nanmin()
nanargmax()
nanargmin()
>>> x = np.arange(10.)
>>> x[3] = np.nan
>>> x.sum()
nan
>>> np.nansum(x)
42.0
How numpy handles numerical exceptions
--------------------------------------
The default is to ``'warn'`` for ``invalid``, ``divide``, and ``overflow``
and ``'ignore'`` for ``underflow``. But this can be changed, and it can be
set individually for different kinds of exceptions. The different behaviors
are:
- 'ignore' : Take no action when the exception occurs.
- 'warn' : Print a `RuntimeWarning` (via the Python `warnings` module).
- 'raise' : Raise a `FloatingPointError`.
- 'call' : Call a function specified using the `seterrcall` function.
- 'print' : Print a warning directly to ``stdout``.
- 'log' : Record error in a Log object specified by `seterrcall`.
These behaviors can be set for all kinds of errors or specific ones:
- all : apply to all numeric exceptions
- invalid : when NaNs are generated
- divide : divide by zero (for integers as well!)
- overflow : floating point overflows
- underflow : floating point underflows
Note that integer divide-by-zero is handled by the same machinery.
These behaviors are set on a per-thread basis.
Examples
--------
::
>>> oldsettings = np.seterr(all='warn')
>>> np.zeros(5,dtype=np.float32)/0.
invalid value encountered in divide
>>> j = np.seterr(under='ignore')
>>> np.array([1.e-100])**10
>>> j = np.seterr(invalid='raise')
>>> np.sqrt(np.array([-1.]))
FloatingPointError: invalid value encountered in sqrt
>>> def errorhandler(errstr, errflag):
... print("saw stupid error!")
>>> np.seterrcall(errorhandler)
<function err_handler at 0x...>
>>> j = np.seterr(all='call')
>>> np.zeros(5, dtype=np.int32)/0
FloatingPointError: invalid value encountered in divide
saw stupid error!
>>> j = np.seterr(**oldsettings) # restore previous
... # error-handling settings
Interfacing to C
----------------
Only a survey of the choices. Little detail on how each works.
1) Bare metal, wrap your own C-code manually.
- Plusses:
- Efficient
- No dependencies on other tools
- Minuses:
- Lots of learning overhead:
- need to learn basics of Python C API
- need to learn basics of numpy C API
- need to learn how to handle reference counting and love it.
- Reference counting often difficult to get right.
- getting it wrong leads to memory leaks, and worse, segfaults
- API will change for Python 3.0!
2) Cython
- Plusses:
- avoid learning C API's
- no dealing with reference counting
- can code in pseudo python and generate C code
- can also interface to existing C code
- should shield you from changes to Python C api
- has become the de-facto standard within the scientific Python community
- fast indexing support for arrays
- Minuses:
- Can write code in non-standard form which may become obsolete
- Not as flexible as manual wrapping
3) ctypes
- Plusses:
- part of Python standard library
- good for interfacing to existing sharable libraries, particularly
Windows DLLs
- avoids API/reference counting issues
- good numpy support: arrays have all these in their ctypes
attribute: ::
a.ctypes.data a.ctypes.get_strides
a.ctypes.data_as a.ctypes.shape
a.ctypes.get_as_parameter a.ctypes.shape_as
a.ctypes.get_data a.ctypes.strides
a.ctypes.get_shape a.ctypes.strides_as
- Minuses:
- can't use for writing code to be turned into C extensions, only a wrapper
tool.
4) SWIG (automatic wrapper generator)
- Plusses:
- around a long time
- multiple scripting language support
- C++ support
- Good for wrapping large (many functions) existing C libraries
- Minuses:
- generates lots of code between Python and the C code
- can cause performance problems that are nearly impossible to optimize
out
- interface files can be hard to write
- doesn't necessarily avoid reference counting issues or needing to know
API's
5) scipy.weave
- Plusses:
- can turn many numpy expressions into C code
- dynamic compiling and loading of generated C code
- can embed pure C code in Python module and have weave extract, generate
interfaces and compile, etc.
- Minuses:
- Future very uncertain: it's the only part of Scipy not ported to Python 3
and is effectively deprecated in favor of Cython.
6) Psyco
- Plusses:
- Turns pure python into efficient machine code through jit-like
optimizations
- very fast when it optimizes well
- Minuses:
- Only on intel (windows?)
- Doesn't do much for numpy?
Interfacing to Fortran:
-----------------------
The clear choice to wrap Fortran code is
`f2py <https://docs.scipy.org/doc/numpy/f2py/>`_.
Pyfort is an older alternative, but not supported any longer.
Fwrap is a newer project that looked promising but isn't being developed any
longer.
Interfacing to C++:
-------------------
1) Cython
2) CXX
3) Boost.python
4) SWIG
5) SIP (used mainly in PyQT)
"""

View file

@ -0,0 +1,646 @@
"""
=================
Structured Arrays
=================
Introduction
============
Structured arrays are ndarrays whose datatype is a composition of simpler
datatypes organized as a sequence of named :term:`fields <field>`. For example,
::
>>> x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
... dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
>>> x
array([('Rex', 9, 81.), ('Fido', 3, 27.)],
dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')])
Here ``x`` is a one-dimensional array of length two whose datatype is a
structure with three fields: 1. A string of length 10 or less named 'name', 2.
a 32-bit integer named 'age', and 3. a 32-bit float named 'weight'.
If you index ``x`` at position 1 you get a structure::
>>> x[1]
('Fido', 3, 27.0)
You can access and modify individual fields of a structured array by indexing
with the field name::
>>> x['age']
array([9, 3], dtype=int32)
>>> x['age'] = 5
>>> x
array([('Rex', 5, 81.), ('Fido', 5, 27.)],
dtype=[('name', 'U10'), ('age', '<i4'), ('weight', '<f4')])
Structured datatypes are designed to be able to mimic 'structs' in the C
language, and share a similar memory layout. They are meant for interfacing with
C code and for low-level manipulation of structured buffers, for example for
interpreting binary blobs. For these purposes they support specialized features
such as subarrays, nested datatypes, and unions, and allow control over the
memory layout of the structure.
Users looking to manipulate tabular data, such as stored in csv files, may find
other pydata projects more suitable, such as xarray, pandas, or DataArray.
These provide a high-level interface for tabular data analysis and are better
optimized for that use. For instance, the C-struct-like memory layout of
structured arrays in numpy can lead to poor cache behavior in comparison.
.. _defining-structured-types:
Structured Datatypes
====================
A structured datatype can be thought of as a sequence of bytes of a certain
length (the structure's :term:`itemsize`) which is interpreted as a collection
of fields. Each field has a name, a datatype, and a byte offset within the
structure. The datatype of a field may be any numpy datatype including other
structured datatypes, and it may also be a :term:`subarray data type` which
behaves like an ndarray of a specified shape. The offsets of the fields are
arbitrary, and fields may even overlap. These offsets are usually determined
automatically by numpy, but can also be specified.
Structured Datatype Creation
----------------------------
Structured datatypes may be created using the function :func:`numpy.dtype`.
There are 4 alternative forms of specification which vary in flexibility and
conciseness. These are further documented in the
:ref:`Data Type Objects <arrays.dtypes.constructing>` reference page, and in
summary they are:
1. A list of tuples, one tuple per field
Each tuple has the form ``(fieldname, datatype, shape)`` where shape is
optional. ``fieldname`` is a string (or tuple if titles are used, see
:ref:`Field Titles <titles>` below), ``datatype`` may be any object
convertible to a datatype, and ``shape`` is a tuple of integers specifying
subarray shape.
>>> np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))])
dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))])
If ``fieldname`` is the empty string ``''``, the field will be given a
default name of the form ``f#``, where ``#`` is the integer index of the
field, counting from 0 from the left::
>>> np.dtype([('x', 'f4'), ('', 'i4'), ('z', 'i8')])
dtype([('x', '<f4'), ('f1', '<i4'), ('z', '<i8')])
The byte offsets of the fields within the structure and the total
structure itemsize are determined automatically.
2. A string of comma-separated dtype specifications
In this shorthand notation any of the :ref:`string dtype specifications
<arrays.dtypes.constructing>` may be used in a string and separated by
commas. The itemsize and byte offsets of the fields are determined
automatically, and the field names are given the default names ``f0``,
``f1``, etc. ::
>>> np.dtype('i8, f4, S3')
dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')])
>>> np.dtype('3int8, float32, (2, 3)float64')
dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))])
3. A dictionary of field parameter arrays
This is the most flexible form of specification since it allows control
over the byte-offsets of the fields and the itemsize of the structure.
The dictionary has two required keys, 'names' and 'formats', and four
optional keys, 'offsets', 'itemsize', 'aligned' and 'titles'. The values
for 'names' and 'formats' should respectively be a list of field names and
a list of dtype specifications, of the same length. The optional 'offsets'
value should be a list of integer byte-offsets, one for each field within
the structure. If 'offsets' is not given the offsets are determined
automatically. The optional 'itemsize' value should be an integer
describing the total size in bytes of the dtype, which must be large
enough to contain all the fields.
::
>>> np.dtype({'names': ['col1', 'col2'], 'formats': ['i4', 'f4']})
dtype([('col1', '<i4'), ('col2', '<f4')])
>>> np.dtype({'names': ['col1', 'col2'],
... 'formats': ['i4', 'f4'],
... 'offsets': [0, 4],
... 'itemsize': 12})
dtype({'names':['col1','col2'], 'formats':['<i4','<f4'], 'offsets':[0,4], 'itemsize':12})
Offsets may be chosen such that the fields overlap, though this will mean
that assigning to one field may clobber any overlapping field's data. As
an exception, fields of :class:`numpy.object` type cannot overlap with
other fields, because of the risk of clobbering the internal object
pointer and then dereferencing it.
The optional 'aligned' value can be set to ``True`` to make the automatic
offset computation use aligned offsets (see :ref:`offsets-and-alignment`),
as if the 'align' keyword argument of :func:`numpy.dtype` had been set to
True.
The optional 'titles' value should be a list of titles of the same length
as 'names', see :ref:`Field Titles <titles>` below.
4. A dictionary of field names
The use of this form of specification is discouraged, but documented here
because older numpy code may use it. The keys of the dictionary are the
field names and the values are tuples specifying type and offset::
>>> np.dtype({'col1': ('i1', 0), 'col2': ('f4', 1)})
dtype([('col1', 'i1'), ('col2', '<f4')])
This form is discouraged because Python dictionaries do not preserve order
in Python versions before Python 3.6, and the order of the fields in a
structured dtype has meaning. :ref:`Field Titles <titles>` may be
specified by using a 3-tuple, see below.
Manipulating and Displaying Structured Datatypes
------------------------------------------------
The list of field names of a structured datatype can be found in the ``names``
attribute of the dtype object::
>>> d = np.dtype([('x', 'i8'), ('y', 'f4')])
>>> d.names
('x', 'y')
The field names may be modified by assigning to the ``names`` attribute using a
sequence of strings of the same length.
The dtype object also has a dictionary-like attribute, ``fields``, whose keys
are the field names (and :ref:`Field Titles <titles>`, see below) and whose
values are tuples containing the dtype and byte offset of each field. ::
>>> d.fields
mappingproxy({'x': (dtype('int64'), 0), 'y': (dtype('float32'), 8)})
Both the ``names`` and ``fields`` attributes will equal ``None`` for
unstructured arrays. The recommended way to test if a dtype is structured is
with `if dt.names is not None` rather than `if dt.names`, to account for dtypes
with 0 fields.
The string representation of a structured datatype is shown in the "list of
tuples" form if possible, otherwise numpy falls back to using the more general
dictionary form.
.. _offsets-and-alignment:
Automatic Byte Offsets and Alignment
------------------------------------
Numpy uses one of two methods to automatically determine the field byte offsets
and the overall itemsize of a structured datatype, depending on whether
``align=True`` was specified as a keyword argument to :func:`numpy.dtype`.
By default (``align=False``), numpy will pack the fields together such that
each field starts at the byte offset the previous field ended, and the fields
are contiguous in memory. ::
>>> def print_offsets(d):
... print("offsets:", [d.fields[name][1] for name in d.names])
... print("itemsize:", d.itemsize)
>>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2'))
offsets: [0, 1, 2, 6, 7, 15]
itemsize: 17
If ``align=True`` is set, numpy will pad the structure in the same way many C
compilers would pad a C-struct. Aligned structures can give a performance
improvement in some cases, at the cost of increased datatype size. Padding
bytes are inserted between fields such that each field's byte offset will be a
multiple of that field's alignment, which is usually equal to the field's size
in bytes for simple datatypes, see :c:member:`PyArray_Descr.alignment`. The
structure will also have trailing padding added so that its itemsize is a
multiple of the largest field's alignment. ::
>>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2', align=True))
offsets: [0, 1, 4, 8, 16, 24]
itemsize: 32
Note that although almost all modern C compilers pad in this way by default,
padding in C structs is C-implementation-dependent so this memory layout is not
guaranteed to exactly match that of a corresponding struct in a C program. Some
work may be needed, either on the numpy side or the C side, to obtain exact
correspondence.
If offsets were specified using the optional ``offsets`` key in the
dictionary-based dtype specification, setting ``align=True`` will check that
each field's offset is a multiple of its size and that the itemsize is a
multiple of the largest field size, and raise an exception if not.
If the offsets of the fields and itemsize of a structured array satisfy the
alignment conditions, the array will have the ``ALIGNED`` :attr:`flag
<numpy.ndarray.flags>` set.
A convenience function :func:`numpy.lib.recfunctions.repack_fields` converts an
aligned dtype or array to a packed one and vice versa. It takes either a dtype
or structured ndarray as an argument, and returns a copy with fields re-packed,
with or without padding bytes.
.. _titles:
Field Titles
------------
In addition to field names, fields may also have an associated :term:`title`,
an alternate name, which is sometimes used as an additional description or
alias for the field. The title may be used to index an array, just like a
field name.
To add titles when using the list-of-tuples form of dtype specification, the
field name may be specified as a tuple of two strings instead of a single
string, which will be the field's title and field name respectively. For
example::
>>> np.dtype([(('my title', 'name'), 'f4')])
dtype([(('my title', 'name'), '<f4')])
When using the first form of dictionary-based specification, the titles may be
supplied as an extra ``'titles'`` key as described above. When using the second
(discouraged) dictionary-based specification, the title can be supplied by
providing a 3-element tuple ``(datatype, offset, title)`` instead of the usual
2-element tuple::
>>> np.dtype({'name': ('i4', 0, 'my title')})
dtype([(('my title', 'name'), '<i4')])
The ``dtype.fields`` dictionary will contain titles as keys, if any
titles are used. This means effectively that a field with a title will be
represented twice in the fields dictionary. The tuple values for these fields
will also have a third element, the field title. Because of this, and because
the ``names`` attribute preserves the field order while the ``fields``
attribute may not, it is recommended to iterate through the fields of a dtype
using the ``names`` attribute of the dtype, which will not list titles, as
in::
>>> for name in d.names:
... print(d.fields[name][:2])
(dtype('int64'), 0)
(dtype('float32'), 8)
Union types
-----------
Structured datatypes are implemented in numpy to have base type
:class:`numpy.void` by default, but it is possible to interpret other numpy
types as structured types using the ``(base_dtype, dtype)`` form of dtype
specification described in
:ref:`Data Type Objects <arrays.dtypes.constructing>`. Here, ``base_dtype`` is
the desired underlying dtype, and fields and flags will be copied from
``dtype``. This dtype is similar to a 'union' in C.
Indexing and Assignment to Structured arrays
============================================
Assigning data to a Structured Array
------------------------------------
There are a number of ways to assign values to a structured array: Using python
tuples, using scalar values, or using other structured arrays.
Assignment from Python Native Types (Tuples)
````````````````````````````````````````````
The simplest way to assign values to a structured array is using python tuples.
Each assigned value should be a tuple of length equal to the number of fields
in the array, and not a list or array as these will trigger numpy's
broadcasting rules. The tuple's elements are assigned to the successive fields
of the array, from left to right::
>>> x = np.array([(1, 2, 3), (4, 5, 6)], dtype='i8, f4, f8')
>>> x[1] = (7, 8, 9)
>>> x
array([(1, 2., 3.), (7, 8., 9.)],
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '<f8')])
Assignment from Scalars
```````````````````````
A scalar assigned to a structured element will be assigned to all fields. This
happens when a scalar is assigned to a structured array, or when an
unstructured array is assigned to a structured array::
>>> x = np.zeros(2, dtype='i8, f4, ?, S1')
>>> x[:] = 3
>>> x
array([(3, 3., True, b'3'), (3, 3., True, b'3')],
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])
>>> x[:] = np.arange(2)
>>> x
array([(0, 0., False, b'0'), (1, 1., True, b'1')],
dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])
Structured arrays can also be assigned to unstructured arrays, but only if the
structured datatype has just a single field::
>>> twofield = np.zeros(2, dtype=[('A', 'i4'), ('B', 'i4')])
>>> onefield = np.zeros(2, dtype=[('A', 'i4')])
>>> nostruct = np.zeros(2, dtype='i4')
>>> nostruct[:] = twofield
Traceback (most recent call last):
...
TypeError: Cannot cast array data from dtype([('A', '<i4'), ('B', '<i4')]) to dtype('int32') according to the rule 'unsafe'
Assignment from other Structured Arrays
```````````````````````````````````````
Assignment between two structured arrays occurs as if the source elements had
been converted to tuples and then assigned to the destination elements. That
is, the first field of the source array is assigned to the first field of the
destination array, and the second field likewise, and so on, regardless of
field names. Structured arrays with a different number of fields cannot be
assigned to each other. Bytes of the destination structure which are not
included in any of the fields are unaffected. ::
>>> a = np.zeros(3, dtype=[('a', 'i8'), ('b', 'f4'), ('c', 'S3')])
>>> b = np.ones(3, dtype=[('x', 'f4'), ('y', 'S3'), ('z', 'O')])
>>> b[:] = a
>>> b
array([(0., b'0.0', b''), (0., b'0.0', b''), (0., b'0.0', b'')],
dtype=[('x', '<f4'), ('y', 'S3'), ('z', 'O')])
Assignment involving subarrays
``````````````````````````````
When assigning to fields which are subarrays, the assigned value will first be
broadcast to the shape of the subarray.
Indexing Structured Arrays
--------------------------
Accessing Individual Fields
```````````````````````````
Individual fields of a structured array may be accessed and modified by indexing
the array with the field name. ::
>>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')])
>>> x['foo']
array([1, 3])
>>> x['foo'] = 10
>>> x
array([(10, 2.), (10, 4.)],
dtype=[('foo', '<i8'), ('bar', '<f4')])
The resulting array is a view into the original array. It shares the same
memory locations and writing to the view will modify the original array. ::
>>> y = x['bar']
>>> y[:] = 11
>>> x
array([(10, 11.), (10, 11.)],
dtype=[('foo', '<i8'), ('bar', '<f4')])
This view has the same dtype and itemsize as the indexed field, so it is
typically a non-structured array, except in the case of nested structures.
>>> y.dtype, y.shape, y.strides
(dtype('float32'), (2,), (12,))
If the accessed field is a subarray, the dimensions of the subarray
are appended to the shape of the result::
>>> x = np.zeros((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))])
>>> x['a'].shape
(2, 2)
>>> x['b'].shape
(2, 2, 3, 3)
Accessing Multiple Fields
```````````````````````````
One can index and assign to a structured array with a multi-field index, where
the index is a list of field names.
.. warning::
The behavior of multi-field indexes changed from Numpy 1.15 to Numpy 1.16.
The result of indexing with a multi-field index is a view into the original
array, as follows::
>>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')])
>>> a[['a', 'c']]
array([(0, 0.), (0, 0.), (0, 0.)],
dtype={'names':['a','c'], 'formats':['<i4','<f4'], 'offsets':[0,8], 'itemsize':12})
Assignment to the view modifies the original array. The view's fields will be
in the order they were indexed. Note that unlike for single-field indexing, the
dtype of the view has the same itemsize as the original array, and has fields
at the same offsets as in the original array, and unindexed fields are merely
missing.
.. warning::
In Numpy 1.15, indexing an array with a multi-field index returned a copy of
the result above, but with fields packed together in memory as if
passed through :func:`numpy.lib.recfunctions.repack_fields`.
The new behavior as of Numpy 1.16 leads to extra "padding" bytes at the
location of unindexed fields compared to 1.15. You will need to update any
code which depends on the data having a "packed" layout. For instance code
such as::
>>> a[['a', 'c']].view('i8') # Fails in Numpy 1.16
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype
will need to be changed. This code has raised a ``FutureWarning`` since
Numpy 1.12, and similar code has raised ``FutureWarning`` since 1.7.
In 1.16 a number of functions have been introduced in the
:mod:`numpy.lib.recfunctions` module to help users account for this
change. These are
:func:`numpy.lib.recfunctions.repack_fields`.
:func:`numpy.lib.recfunctions.structured_to_unstructured`,
:func:`numpy.lib.recfunctions.unstructured_to_structured`,
:func:`numpy.lib.recfunctions.apply_along_fields`,
:func:`numpy.lib.recfunctions.assign_fields_by_name`, and
:func:`numpy.lib.recfunctions.require_fields`.
The function :func:`numpy.lib.recfunctions.repack_fields` can always be
used to reproduce the old behavior, as it will return a packed copy of the
structured array. The code above, for example, can be replaced with:
>>> from numpy.lib.recfunctions import repack_fields
>>> repack_fields(a[['a', 'c']]).view('i8') # supported in 1.16
array([0, 0, 0])
Furthermore, numpy now provides a new function
:func:`numpy.lib.recfunctions.structured_to_unstructured` which is a safer
and more efficient alternative for users who wish to convert structured
arrays to unstructured arrays, as the view above is often indeded to do.
This function allows safe conversion to an unstructured type taking into
account padding, often avoids a copy, and also casts the datatypes
as needed, unlike the view. Code such as:
>>> b = np.zeros(3, dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
>>> b[['x', 'z']].view('f4')
array([0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
can be made safer by replacing with:
>>> from numpy.lib.recfunctions import structured_to_unstructured
>>> structured_to_unstructured(b[['x', 'z']])
array([0, 0, 0])
Assignment to an array with a multi-field index modifies the original array::
>>> a[['a', 'c']] = (2, 3)
>>> a
array([(2, 0, 3.), (2, 0, 3.), (2, 0, 3.)],
dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<f4')])
This obeys the structured array assignment rules described above. For example,
this means that one can swap the values of two fields using appropriate
multi-field indexes::
>>> a[['a', 'c']] = a[['c', 'a']]
Indexing with an Integer to get a Structured Scalar
```````````````````````````````````````````````````
Indexing a single element of a structured array (with an integer index) returns
a structured scalar::
>>> x = np.array([(1, 2., 3.)], dtype='i, f, f')
>>> scalar = x[0]
>>> scalar
(1, 2., 3.)
>>> type(scalar)
<class 'numpy.void'>
Unlike other numpy scalars, structured scalars are mutable and act like views
into the original array, such that modifying the scalar will modify the
original array. Structured scalars also support access and assignment by field
name::
>>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')])
>>> s = x[0]
>>> s['bar'] = 100
>>> x
array([(1, 100.), (3, 4.)],
dtype=[('foo', '<i8'), ('bar', '<f4')])
Similarly to tuples, structured scalars can also be indexed with an integer::
>>> scalar = np.array([(1, 2., 3.)], dtype='i, f, f')[0]
>>> scalar[0]
1
>>> scalar[1] = 4
Thus, tuples might be thought of as the native Python equivalent to numpy's
structured types, much like native python integers are the equivalent to
numpy's integer types. Structured scalars may be converted to a tuple by
calling :func:`ndarray.item`::
>>> scalar.item(), type(scalar.item())
((1, 4.0, 3.0), <class 'tuple'>)
Viewing Structured Arrays Containing Objects
--------------------------------------------
In order to prevent clobbering object pointers in fields of
:class:`numpy.object` type, numpy currently does not allow views of structured
arrays containing objects.
Structure Comparison
--------------------
If the dtypes of two void structured arrays are equal, testing the equality of
the arrays will result in a boolean array with the dimensions of the original
arrays, with elements set to ``True`` where all fields of the corresponding
structures are equal. Structured dtypes are equal if the field names,
dtypes and titles are the same, ignoring endianness, and the fields are in
the same order::
>>> a = np.zeros(2, dtype=[('a', 'i4'), ('b', 'i4')])
>>> b = np.ones(2, dtype=[('a', 'i4'), ('b', 'i4')])
>>> a == b
array([False, False])
Currently, if the dtypes of two void structured arrays are not equivalent the
comparison fails, returning the scalar value ``False``. This behavior is
deprecated as of numpy 1.10 and will raise an error or perform elementwise
comparison in the future.
The ``<`` and ``>`` operators always return ``False`` when comparing void
structured arrays, and arithmetic and bitwise operations are not supported.
Record Arrays
=============
As an optional convenience numpy provides an ndarray subclass,
:class:`numpy.recarray`, and associated helper functions in the
:mod:`numpy.rec` submodule, that allows access to fields of structured arrays
by attribute instead of only by index. Record arrays also use a special
datatype, :class:`numpy.record`, that allows field access by attribute on the
structured scalars obtained from the array.
The simplest way to create a record array is with :func:`numpy.rec.array`::
>>> recordarr = np.rec.array([(1, 2., 'Hello'), (2, 3., "World")],
... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
>>> recordarr.bar
array([ 2., 3.], dtype=float32)
>>> recordarr[1:2]
rec.array([(2, 3., b'World')],
dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])
>>> recordarr[1:2].foo
array([2], dtype=int32)
>>> recordarr.foo[1:2]
array([2], dtype=int32)
>>> recordarr[1].baz
b'World'
:func:`numpy.rec.array` can convert a wide variety of arguments into record
arrays, including structured arrays::
>>> arr = np.array([(1, 2., 'Hello'), (2, 3., "World")],
... dtype=[('foo', 'i4'), ('bar', 'f4'), ('baz', 'S10')])
>>> recordarr = np.rec.array(arr)
The :mod:`numpy.rec` module provides a number of other convenience functions for
creating record arrays, see :ref:`record array creation routines
<routines.array-creation.rec>`.
A record array representation of a structured array can be obtained using the
appropriate `view <numpy-ndarray-view>`_::
>>> arr = np.array([(1, 2., 'Hello'), (2, 3., "World")],
... dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'a10')])
>>> recordarr = arr.view(dtype=np.dtype((np.record, arr.dtype)),
... type=np.recarray)
For convenience, viewing an ndarray as type :class:`np.recarray` will
automatically convert to :class:`np.record` datatype, so the dtype can be left
out of the view::
>>> recordarr = arr.view(np.recarray)
>>> recordarr.dtype
dtype((numpy.record, [('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')]))
To get back to a plain ndarray both the dtype and type must be reset. The
following view does so, taking into account the unusual case that the
recordarr was not a structured type::
>>> arr2 = recordarr.view(recordarr.dtype.fields or recordarr.dtype, np.ndarray)
Record array fields accessed by index or by attribute are returned as a record
array if the field has a structured type but as a plain ndarray otherwise. ::
>>> recordarr = np.rec.array([('Hello', (1, 2)), ("World", (3, 4))],
... dtype=[('foo', 'S6'),('bar', [('A', int), ('B', int)])])
>>> type(recordarr.foo)
<class 'numpy.ndarray'>
>>> type(recordarr.bar)
<class 'numpy.recarray'>
Note that if a field has the same name as an ndarray attribute, the ndarray
attribute takes precedence. Such fields will be inaccessible by attribute but
will still be accessible by index.
"""

View file

@ -0,0 +1,752 @@
"""=============================
Subclassing ndarray in python
=============================
Introduction
------------
Subclassing ndarray is relatively simple, but it has some complications
compared to other Python objects. On this page we explain the machinery
that allows you to subclass ndarray, and the implications for
implementing a subclass.
ndarrays and object creation
============================
Subclassing ndarray is complicated by the fact that new instances of
ndarray classes can come about in three different ways. These are:
#. Explicit constructor call - as in ``MySubClass(params)``. This is
the usual route to Python instance creation.
#. View casting - casting an existing ndarray as a given subclass
#. New from template - creating a new instance from a template
instance. Examples include returning slices from a subclassed array,
creating return types from ufuncs, and copying arrays. See
:ref:`new-from-template` for more details
The last two are characteristics of ndarrays - in order to support
things like array slicing. The complications of subclassing ndarray are
due to the mechanisms numpy has to support these latter two routes of
instance creation.
.. _view-casting:
View casting
------------
*View casting* is the standard ndarray mechanism by which you take an
ndarray of any subclass, and return a view of the array as another
(specified) subclass:
>>> import numpy as np
>>> # create a completely useless ndarray subclass
>>> class C(np.ndarray): pass
>>> # create a standard ndarray
>>> arr = np.zeros((3,))
>>> # take a view of it, as our useless subclass
>>> c_arr = arr.view(C)
>>> type(c_arr)
<class 'C'>
.. _new-from-template:
Creating new from template
--------------------------
New instances of an ndarray subclass can also come about by a very
similar mechanism to :ref:`view-casting`, when numpy finds it needs to
create a new instance from a template instance. The most obvious place
this has to happen is when you are taking slices of subclassed arrays.
For example:
>>> v = c_arr[1:]
>>> type(v) # the view is of type 'C'
<class 'C'>
>>> v is c_arr # but it's a new instance
False
The slice is a *view* onto the original ``c_arr`` data. So, when we
take a view from the ndarray, we return a new ndarray, of the same
class, that points to the data in the original.
There are other points in the use of ndarrays where we need such views,
such as copying arrays (``c_arr.copy()``), creating ufunc output arrays
(see also :ref:`array-wrap`), and reducing methods (like
``c_arr.mean()``).
Relationship of view casting and new-from-template
--------------------------------------------------
These paths both use the same machinery. We make the distinction here,
because they result in different input to your methods. Specifically,
:ref:`view-casting` means you have created a new instance of your array
type from any potential subclass of ndarray. :ref:`new-from-template`
means you have created a new instance of your class from a pre-existing
instance, allowing you - for example - to copy across attributes that
are particular to your subclass.
Implications for subclassing
----------------------------
If we subclass ndarray, we need to deal not only with explicit
construction of our array type, but also :ref:`view-casting` or
:ref:`new-from-template`. NumPy has the machinery to do this, and this
machinery that makes subclassing slightly non-standard.
There are two aspects to the machinery that ndarray uses to support
views and new-from-template in subclasses.
The first is the use of the ``ndarray.__new__`` method for the main work
of object initialization, rather then the more usual ``__init__``
method. The second is the use of the ``__array_finalize__`` method to
allow subclasses to clean up after the creation of views and new
instances from templates.
A brief Python primer on ``__new__`` and ``__init__``
=====================================================
``__new__`` is a standard Python method, and, if present, is called
before ``__init__`` when we create a class instance. See the `python
__new__ documentation
<https://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail.
For example, consider the following Python code:
.. testcode::
class C:
def __new__(cls, *args):
print('Cls in __new__:', cls)
print('Args in __new__:', args)
# The `object` type __new__ method takes a single argument.
return object.__new__(cls)
def __init__(self, *args):
print('type(self) in __init__:', type(self))
print('Args in __init__:', args)
meaning that we get:
>>> c = C('hello')
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
type(self) in __init__: <class 'C'>
Args in __init__: ('hello',)
When we call ``C('hello')``, the ``__new__`` method gets its own class
as first argument, and the passed argument, which is the string
``'hello'``. After python calls ``__new__``, it usually (see below)
calls our ``__init__`` method, with the output of ``__new__`` as the
first argument (now a class instance), and the passed arguments
following.
As you can see, the object can be initialized in the ``__new__``
method or the ``__init__`` method, or both, and in fact ndarray does
not have an ``__init__`` method, because all the initialization is
done in the ``__new__`` method.
Why use ``__new__`` rather than just the usual ``__init__``? Because
in some cases, as for ndarray, we want to be able to return an object
of some other class. Consider the following:
.. testcode::
class D(C):
def __new__(cls, *args):
print('D cls is:', cls)
print('D args in __new__:', args)
return C.__new__(C, *args)
def __init__(self, *args):
# we never get here
print('In D __init__')
meaning that:
>>> obj = D('hello')
D cls is: <class 'D'>
D args in __new__: ('hello',)
Cls in __new__: <class 'C'>
Args in __new__: ('hello',)
>>> type(obj)
<class 'C'>
The definition of ``C`` is the same as before, but for ``D``, the
``__new__`` method returns an instance of class ``C`` rather than
``D``. Note that the ``__init__`` method of ``D`` does not get
called. In general, when the ``__new__`` method returns an object of
class other than the class in which it is defined, the ``__init__``
method of that class is not called.
This is how subclasses of the ndarray class are able to return views
that preserve the class type. When taking a view, the standard
ndarray machinery creates the new ndarray object with something
like::
obj = ndarray.__new__(subtype, shape, ...
where ``subdtype`` is the subclass. Thus the returned view is of the
same class as the subclass, rather than being of class ``ndarray``.
That solves the problem of returning views of the same type, but now
we have a new problem. The machinery of ndarray can set the class
this way, in its standard methods for taking views, but the ndarray
``__new__`` method knows nothing of what we have done in our own
``__new__`` method in order to set attributes, and so on. (Aside -
why not call ``obj = subdtype.__new__(...`` then? Because we may not
have a ``__new__`` method with the same call signature).
The role of ``__array_finalize__``
==================================
``__array_finalize__`` is the mechanism that numpy provides to allow
subclasses to handle the various ways that new instances get created.
Remember that subclass instances can come about in these three ways:
#. explicit constructor call (``obj = MySubClass(params)``). This will
call the usual sequence of ``MySubClass.__new__`` then (if it exists)
``MySubClass.__init__``.
#. :ref:`view-casting`
#. :ref:`new-from-template`
Our ``MySubClass.__new__`` method only gets called in the case of the
explicit constructor call, so we can't rely on ``MySubClass.__new__`` or
``MySubClass.__init__`` to deal with the view casting and
new-from-template. It turns out that ``MySubClass.__array_finalize__``
*does* get called for all three methods of object creation, so this is
where our object creation housekeeping usually goes.
* For the explicit constructor call, our subclass will need to create a
new ndarray instance of its own class. In practice this means that
we, the authors of the code, will need to make a call to
``ndarray.__new__(MySubClass,...)``, a class-hierarchy prepared call to
``super(MySubClass, cls).__new__(cls, ...)``, or do view casting of an
existing array (see below)
* For view casting and new-from-template, the equivalent of
``ndarray.__new__(MySubClass,...`` is called, at the C level.
The arguments that ``__array_finalize__`` receives differ for the three
methods of instance creation above.
The following code allows us to look at the call sequences and arguments:
.. testcode::
import numpy as np
class C(np.ndarray):
def __new__(cls, *args, **kwargs):
print('In __new__ with class %s' % cls)
return super(C, cls).__new__(cls, *args, **kwargs)
def __init__(self, *args, **kwargs):
# in practice you probably will not need or want an __init__
# method for your subclass
print('In __init__ with class %s' % self.__class__)
def __array_finalize__(self, obj):
print('In array_finalize:')
print(' self type is %s' % type(self))
print(' obj type is %s' % type(obj))
Now:
>>> # Explicit constructor
>>> c = C((10,))
In __new__ with class <class 'C'>
In array_finalize:
self type is <class 'C'>
obj type is <type 'NoneType'>
In __init__ with class <class 'C'>
>>> # View casting
>>> a = np.arange(10)
>>> cast_a = a.view(C)
In array_finalize:
self type is <class 'C'>
obj type is <type 'numpy.ndarray'>
>>> # Slicing (example of new-from-template)
>>> cv = c[:1]
In array_finalize:
self type is <class 'C'>
obj type is <class 'C'>
The signature of ``__array_finalize__`` is::
def __array_finalize__(self, obj):
One sees that the ``super`` call, which goes to
``ndarray.__new__``, passes ``__array_finalize__`` the new object, of our
own class (``self``) as well as the object from which the view has been
taken (``obj``). As you can see from the output above, the ``self`` is
always a newly created instance of our subclass, and the type of ``obj``
differs for the three instance creation methods:
* When called from the explicit constructor, ``obj`` is ``None``
* When called from view casting, ``obj`` can be an instance of any
subclass of ndarray, including our own.
* When called in new-from-template, ``obj`` is another instance of our
own subclass, that we might use to update the new ``self`` instance.
Because ``__array_finalize__`` is the only method that always sees new
instances being created, it is the sensible place to fill in instance
defaults for new object attributes, among other tasks.
This may be clearer with an example.
Simple example - adding an extra attribute to ndarray
-----------------------------------------------------
.. testcode::
import numpy as np
class InfoArray(np.ndarray):
def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
strides=None, order=None, info=None):
# Create the ndarray instance of our type, given the usual
# ndarray input arguments. This will call the standard
# ndarray constructor, but return an object of our type.
# It also triggers a call to InfoArray.__array_finalize__
obj = super(InfoArray, subtype).__new__(subtype, shape, dtype,
buffer, offset, strides,
order)
# set the new 'info' attribute to the value passed
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# ``self`` is a new object resulting from
# ndarray.__new__(InfoArray, ...), therefore it only has
# attributes that the ndarray.__new__ constructor gave it -
# i.e. those of a standard ndarray.
#
# We could have got to the ndarray.__new__ call in 3 ways:
# From an explicit constructor - e.g. InfoArray():
# obj is None
# (we're in the middle of the InfoArray.__new__
# constructor, and self.info will be set when we return to
# InfoArray.__new__)
if obj is None: return
# From view casting - e.g arr.view(InfoArray):
# obj is arr
# (type(obj) can be InfoArray)
# From new-from-template - e.g infoarr[:3]
# type(obj) is InfoArray
#
# Note that it is here, rather than in the __new__ method,
# that we set the default value for 'info', because this
# method sees all creation of default objects - with the
# InfoArray.__new__ constructor, but also with
# arr.view(InfoArray).
self.info = getattr(obj, 'info', None)
# We do not need to return anything
Using the object looks like this:
>>> obj = InfoArray(shape=(3,)) # explicit constructor
>>> type(obj)
<class 'InfoArray'>
>>> obj.info is None
True
>>> obj = InfoArray(shape=(3,), info='information')
>>> obj.info
'information'
>>> v = obj[1:] # new-from-template - here - slicing
>>> type(v)
<class 'InfoArray'>
>>> v.info
'information'
>>> arr = np.arange(10)
>>> cast_arr = arr.view(InfoArray) # view casting
>>> type(cast_arr)
<class 'InfoArray'>
>>> cast_arr.info is None
True
This class isn't very useful, because it has the same constructor as the
bare ndarray object, including passing in buffers and shapes and so on.
We would probably prefer the constructor to be able to take an already
formed ndarray from the usual numpy calls to ``np.array`` and return an
object.
Slightly more realistic example - attribute added to existing array
-------------------------------------------------------------------
Here is a class that takes a standard ndarray that already exists, casts
as our type, and adds an extra attribute.
.. testcode::
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
So:
>>> arr = np.arange(5)
>>> obj = RealisticInfoArray(arr, info='information')
>>> type(obj)
<class 'RealisticInfoArray'>
>>> obj.info
'information'
>>> v = obj[1:]
>>> type(v)
<class 'RealisticInfoArray'>
>>> v.info
'information'
.. _array-ufunc:
``__array_ufunc__`` for ufuncs
------------------------------
.. versionadded:: 1.13
A subclass can override what happens when executing numpy ufuncs on it by
overriding the default ``ndarray.__array_ufunc__`` method. This method is
executed *instead* of the ufunc and should return either the result of the
operation, or :obj:`NotImplemented` if the operation requested is not
implemented.
The signature of ``__array_ufunc__`` is::
def __array_ufunc__(ufunc, method, *inputs, **kwargs):
- *ufunc* is the ufunc object that was called.
- *method* is a string indicating how the Ufunc was called, either
``"__call__"`` to indicate it was called directly, or one of its
:ref:`methods<ufuncs.methods>`: ``"reduce"``, ``"accumulate"``,
``"reduceat"``, ``"outer"``, or ``"at"``.
- *inputs* is a tuple of the input arguments to the ``ufunc``
- *kwargs* contains any optional or keyword arguments passed to the
function. This includes any ``out`` arguments, which are always
contained in a tuple.
A typical implementation would convert any inputs or outputs that are
instances of one's own class, pass everything on to a superclass using
``super()``, and finally return the results after possible
back-conversion. An example, taken from the test case
``test_ufunc_override_with_super`` in ``core/tests/test_umath.py``, is the
following.
.. testcode::
input numpy as np
class A(np.ndarray):
def __array_ufunc__(self, ufunc, method, *inputs, out=None, **kwargs):
args = []
in_no = []
for i, input_ in enumerate(inputs):
if isinstance(input_, A):
in_no.append(i)
args.append(input_.view(np.ndarray))
else:
args.append(input_)
outputs = out
out_no = []
if outputs:
out_args = []
for j, output in enumerate(outputs):
if isinstance(output, A):
out_no.append(j)
out_args.append(output.view(np.ndarray))
else:
out_args.append(output)
kwargs['out'] = tuple(out_args)
else:
outputs = (None,) * ufunc.nout
info = {}
if in_no:
info['inputs'] = in_no
if out_no:
info['outputs'] = out_no
results = super(A, self).__array_ufunc__(ufunc, method,
*args, **kwargs)
if results is NotImplemented:
return NotImplemented
if method == 'at':
if isinstance(inputs[0], A):
inputs[0].info = info
return
if ufunc.nout == 1:
results = (results,)
results = tuple((np.asarray(result).view(A)
if output is None else output)
for result, output in zip(results, outputs))
if results and isinstance(results[0], A):
results[0].info = info
return results[0] if len(results) == 1 else results
So, this class does not actually do anything interesting: it just
converts any instances of its own to regular ndarray (otherwise, we'd
get infinite recursion!), and adds an ``info`` dictionary that tells
which inputs and outputs it converted. Hence, e.g.,
>>> a = np.arange(5.).view(A)
>>> b = np.sin(a)
>>> b.info
{'inputs': [0]}
>>> b = np.sin(np.arange(5.), out=(a,))
>>> b.info
{'outputs': [0]}
>>> a = np.arange(5.).view(A)
>>> b = np.ones(1).view(A)
>>> c = a + b
>>> c.info
{'inputs': [0, 1]}
>>> a += b
>>> a.info
{'inputs': [0, 1], 'outputs': [0]}
Note that another approach would be to to use ``getattr(ufunc,
methods)(*inputs, **kwargs)`` instead of the ``super`` call. For this example,
the result would be identical, but there is a difference if another operand
also defines ``__array_ufunc__``. E.g., lets assume that we evalulate
``np.add(a, b)``, where ``b`` is an instance of another class ``B`` that has
an override. If you use ``super`` as in the example,
``ndarray.__array_ufunc__`` will notice that ``b`` has an override, which
means it cannot evaluate the result itself. Thus, it will return
`NotImplemented` and so will our class ``A``. Then, control will be passed
over to ``b``, which either knows how to deal with us and produces a result,
or does not and returns `NotImplemented`, raising a ``TypeError``.
If instead, we replace our ``super`` call with ``getattr(ufunc, method)``, we
effectively do ``np.add(a.view(np.ndarray), b)``. Again, ``B.__array_ufunc__``
will be called, but now it sees an ``ndarray`` as the other argument. Likely,
it will know how to handle this, and return a new instance of the ``B`` class
to us. Our example class is not set up to handle this, but it might well be
the best approach if, e.g., one were to re-implement ``MaskedArray`` using
``__array_ufunc__``.
As a final note: if the ``super`` route is suited to a given class, an
advantage of using it is that it helps in constructing class hierarchies.
E.g., suppose that our other class ``B`` also used the ``super`` in its
``__array_ufunc__`` implementation, and we created a class ``C`` that depended
on both, i.e., ``class C(A, B)`` (with, for simplicity, not another
``__array_ufunc__`` override). Then any ufunc on an instance of ``C`` would
pass on to ``A.__array_ufunc__``, the ``super`` call in ``A`` would go to
``B.__array_ufunc__``, and the ``super`` call in ``B`` would go to
``ndarray.__array_ufunc__``, thus allowing ``A`` and ``B`` to collaborate.
.. _array-wrap:
``__array_wrap__`` for ufuncs and other functions
-------------------------------------------------
Prior to numpy 1.13, the behaviour of ufuncs could only be tuned using
``__array_wrap__`` and ``__array_prepare__``. These two allowed one to
change the output type of a ufunc, but, in contrast to
``__array_ufunc__``, did not allow one to make any changes to the inputs.
It is hoped to eventually deprecate these, but ``__array_wrap__`` is also
used by other numpy functions and methods, such as ``squeeze``, so at the
present time is still needed for full functionality.
Conceptually, ``__array_wrap__`` "wraps up the action" in the sense of
allowing a subclass to set the type of the return value and update
attributes and metadata. Let's show how this works with an example. First
we return to the simpler example subclass, but with a different name and
some print statements:
.. testcode::
import numpy as np
class MySubClass(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
print('In __array_finalize__:')
print(' self is %s' % repr(self))
print(' obj is %s' % repr(obj))
if obj is None: return
self.info = getattr(obj, 'info', None)
def __array_wrap__(self, out_arr, context=None):
print('In __array_wrap__:')
print(' self is %s' % repr(self))
print(' arr is %s' % repr(out_arr))
# then just call the parent
return super(MySubClass, self).__array_wrap__(self, out_arr, context)
We run a ufunc on an instance of our new array:
>>> obj = MySubClass(np.arange(5), info='spam')
In __array_finalize__:
self is MySubClass([0, 1, 2, 3, 4])
obj is array([0, 1, 2, 3, 4])
>>> arr2 = np.arange(5)+1
>>> ret = np.add(arr2, obj)
In __array_wrap__:
self is MySubClass([0, 1, 2, 3, 4])
arr is array([1, 3, 5, 7, 9])
In __array_finalize__:
self is MySubClass([1, 3, 5, 7, 9])
obj is MySubClass([0, 1, 2, 3, 4])
>>> ret
MySubClass([1, 3, 5, 7, 9])
>>> ret.info
'spam'
Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method
with arguments ``self`` as ``obj``, and ``out_arr`` as the (ndarray) result
of the addition. In turn, the default ``__array_wrap__``
(``ndarray.__array_wrap__``) has cast the result to class ``MySubClass``,
and called ``__array_finalize__`` - hence the copying of the ``info``
attribute. This has all happened at the C level.
But, we could do anything we wanted:
.. testcode::
class SillySubClass(np.ndarray):
def __array_wrap__(self, arr, context=None):
return 'I lost your data'
>>> arr1 = np.arange(5)
>>> obj = arr1.view(SillySubClass)
>>> arr2 = np.arange(5)
>>> ret = np.multiply(obj, arr2)
>>> ret
'I lost your data'
So, by defining a specific ``__array_wrap__`` method for our subclass,
we can tweak the output from ufuncs. The ``__array_wrap__`` method
requires ``self``, then an argument - which is the result of the ufunc -
and an optional parameter *context*. This parameter is returned by
ufuncs as a 3-element tuple: (name of the ufunc, arguments of the ufunc,
domain of the ufunc), but is not set by other numpy functions. Though,
as seen above, it is possible to do otherwise, ``__array_wrap__`` should
return an instance of its containing class. See the masked array
subclass for an implementation.
In addition to ``__array_wrap__``, which is called on the way out of the
ufunc, there is also an ``__array_prepare__`` method which is called on
the way into the ufunc, after the output arrays are created but before any
computation has been performed. The default implementation does nothing
but pass through the array. ``__array_prepare__`` should not attempt to
access the array data or resize the array, it is intended for setting the
output array type, updating attributes and metadata, and performing any
checks based on the input that may be desired before computation begins.
Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or
subclass thereof or raise an error.
Extra gotchas - custom ``__del__`` methods and ndarray.base
-----------------------------------------------------------
One of the problems that ndarray solves is keeping track of memory
ownership of ndarrays and their views. Consider the case where we have
created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``.
The two objects are looking at the same memory. NumPy keeps track of
where the data came from for a particular array or view, with the
``base`` attribute:
>>> # A normal ndarray, that owns its own data
>>> arr = np.zeros((4,))
>>> # In this case, base is None
>>> arr.base is None
True
>>> # We take a view
>>> v1 = arr[1:]
>>> # base now points to the array that it derived from
>>> v1.base is arr
True
>>> # Take a view of a view
>>> v2 = v1[1:]
>>> # base points to the view it derived from
>>> v2.base is v1
True
In general, if the array owns its own memory, as for ``arr`` in this
case, then ``arr.base`` will be None - there are some exceptions to this
- see the numpy book for more details.
The ``base`` attribute is useful in being able to tell whether we have
a view or the original array. This in turn can be useful if we need
to know whether or not to do some specific cleanup when the subclassed
array is deleted. For example, we may only want to do the cleanup if
the original array is deleted, but not the views. For an example of
how this can work, have a look at the ``memmap`` class in
``numpy.core``.
Subclassing and Downstream Compatibility
----------------------------------------
When sub-classing ``ndarray`` or creating duck-types that mimic the ``ndarray``
interface, it is your responsibility to decide how aligned your APIs will be
with those of numpy. For convenience, many numpy functions that have a corresponding
``ndarray`` method (e.g., ``sum``, ``mean``, ``take``, ``reshape``) work by checking
if the first argument to a function has a method of the same name. If it exists, the
method is called instead of coercing the arguments to a numpy array.
For example, if you want your sub-class or duck-type to be compatible with
numpy's ``sum`` function, the method signature for this object's ``sum`` method
should be the following:
.. testcode::
def sum(self, axis=None, dtype=None, out=None, keepdims=False):
...
This is the exact same method signature for ``np.sum``, so now if a user calls
``np.sum`` on this object, numpy will call the object's own ``sum`` method and
pass in these arguments enumerated above in the signature, and no errors will
be raised because the signatures are completely compatible with each other.
If, however, you decide to deviate from this signature and do something like this:
.. testcode::
def sum(self, axis=None, dtype=None):
...
This object is no longer compatible with ``np.sum`` because if you call ``np.sum``,
it will pass in unexpected arguments ``out`` and ``keepdims``, causing a TypeError
to be raised.
If you wish to maintain compatibility with numpy and its subsequent versions (which
might add new keyword arguments) but do not want to surface all of numpy's arguments,
your function's signature should accept ``**kwargs``. For example:
.. testcode::
def sum(self, axis=None, dtype=None, **unused_kwargs):
...
This object is now compatible with ``np.sum`` again because any extraneous arguments
(i.e. keywords that are not ``axis`` or ``dtype``) will be hidden away in the
``**unused_kwargs`` parameter.
"""

View file

@ -0,0 +1,137 @@
"""
===================
Universal Functions
===================
Ufuncs are, generally speaking, mathematical functions or operations that are
applied element-by-element to the contents of an array. That is, the result
in each output array element only depends on the value in the corresponding
input array (or arrays) and on no other array elements. NumPy comes with a
large suite of ufuncs, and scipy extends that suite substantially. The simplest
example is the addition operator: ::
>>> np.array([0,2,3,4]) + np.array([1,1,-1,2])
array([1, 3, 2, 6])
The ufunc module lists all the available ufuncs in numpy. Documentation on
the specific ufuncs may be found in those modules. This documentation is
intended to address the more general aspects of ufuncs common to most of
them. All of the ufuncs that make use of Python operators (e.g., +, -, etc.)
have equivalent functions defined (e.g. add() for +)
Type coercion
=============
What happens when a binary operator (e.g., +,-,\\*,/, etc) deals with arrays of
two different types? What is the type of the result? Typically, the result is
the higher of the two types. For example: ::
float32 + float64 -> float64
int8 + int32 -> int32
int16 + float32 -> float32
float32 + complex64 -> complex64
There are some less obvious cases generally involving mixes of types
(e.g. uints, ints and floats) where equal bit sizes for each are not
capable of saving all the information in a different type of equivalent
bit size. Some examples are int32 vs float32 or uint32 vs int32.
Generally, the result is the higher type of larger size than both
(if available). So: ::
int32 + float32 -> float64
uint32 + int32 -> int64
Finally, the type coercion behavior when expressions involve Python
scalars is different than that seen for arrays. Since Python has a
limited number of types, combining a Python int with a dtype=np.int8
array does not coerce to the higher type but instead, the type of the
array prevails. So the rules for Python scalars combined with arrays is
that the result will be that of the array equivalent the Python scalar
if the Python scalar is of a higher 'kind' than the array (e.g., float
vs. int), otherwise the resultant type will be that of the array.
For example: ::
Python int + int8 -> int8
Python float + int8 -> float64
ufunc methods
=============
Binary ufuncs support 4 methods.
**.reduce(arr)** applies the binary operator to elements of the array in
sequence. For example: ::
>>> np.add.reduce(np.arange(10)) # adds all elements of array
45
For multidimensional arrays, the first dimension is reduced by default: ::
>>> np.add.reduce(np.arange(10).reshape(2,5))
array([ 5, 7, 9, 11, 13])
The axis keyword can be used to specify different axes to reduce: ::
>>> np.add.reduce(np.arange(10).reshape(2,5),axis=1)
array([10, 35])
**.accumulate(arr)** applies the binary operator and generates an an
equivalently shaped array that includes the accumulated amount for each
element of the array. A couple examples: ::
>>> np.add.accumulate(np.arange(10))
array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])
>>> np.multiply.accumulate(np.arange(1,9))
array([ 1, 2, 6, 24, 120, 720, 5040, 40320])
The behavior for multidimensional arrays is the same as for .reduce(),
as is the use of the axis keyword).
**.reduceat(arr,indices)** allows one to apply reduce to selected parts
of an array. It is a difficult method to understand. See the documentation
at:
**.outer(arr1,arr2)** generates an outer operation on the two arrays arr1 and
arr2. It will work on multidimensional arrays (the shape of the result is
the concatenation of the two input shapes.: ::
>>> np.multiply.outer(np.arange(3),np.arange(4))
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6]])
Output arguments
================
All ufuncs accept an optional output array. The array must be of the expected
output shape. Beware that if the type of the output array is of a different
(and lower) type than the output result, the results may be silently truncated
or otherwise corrupted in the downcast to the lower type. This usage is useful
when one wants to avoid creating large temporary arrays and instead allows one
to reuse the same array memory repeatedly (at the expense of not being able to
use more convenient operator notation in expressions). Note that when the
output argument is used, the ufunc still returns a reference to the result.
>>> x = np.arange(2)
>>> np.add(np.arange(2),np.arange(2.),x)
array([0, 2])
>>> x
array([0, 2])
and & or as ufuncs
==================
Invariably people try to use the python 'and' and 'or' as logical operators
(and quite understandably). But these operators do not behave as normal
operators since Python treats these quite differently. They cannot be
overloaded with array equivalents. Thus using 'and' or 'or' with an array
results in an error. There are two alternatives:
1) use the ufunc functions logical_and() and logical_or().
2) use the bitwise operators & and \\|. The drawback of these is that if
the arguments to these operators are not boolean arrays, the result is
likely incorrect. On the other hand, most usages of logical_and and
logical_or are with boolean arrays. As long as one is careful, this is
a convenient way to apply these operators.
"""