Numpy and PyTorch Tensors Guide
Two things we need to be able to do with data:
- acquire
- process
Acquiring data also requires us to store it, and the most convenient tool we have at our disposal are tensors
or n-dimensional arrays
.
Initializing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Other ways to initialize:
1 2 3 |
|
Reshaping
Change the shape of a tensor by either not changing:
- the number of elements
- the values of elements
1 2 |
|
Defining all dimensions is uneccessary - we only need to define n - 1 dims, the remaining is inferred.
1 2 |
|
Operations
Elementwise operations
Apply a standard scalar operation to each element of on array, or for two tensor inputs apply elementwise operations on each pair of elements.
Thse include standard arithmetic operations (+, -, *, / and **).
1 2 3 |
|
Hadamard product (Elementwise multiplication of two matrices)
Specifically, elementwise multiplication of two matrices is called their Hadamard product (math notation \(\odot\)). Consider matrix \(\mathbf{B} \in \mathbb{R}^{m \times n}\) whose element of row $i` and column \(j\) is \(b_{ij}\). The Hadamard product of matrices \(\mathbf{A}\) and \(\mathbf{B}\)
Linear Algebra Operations
Transpose
1 2 3 |
|
As a special type of the square matrix, a symmetric matrix
\(\mathbf{A}\) is equal to its transpose:
\(\mathbf{A} = \mathbf{A}^\top\). Here we define a symmetric matrix
B
.
1 2 3 |
|
Vector Dot Products
Given two vectors \(\mathbf{x}, \mathbf{y} \in \mathbb{R}^d\), their dot product \(\mathbf{x}^\top \mathbf{y}\) (or \(\langle \mathbf{x}, \mathbf{y} \rangle\)) is a sum over the products of the elements at the same position: \(\mathbf{x}^\top \mathbf{y} = \sum_{i=1}^{d} x_i y_i\).
1 2 3 |
|
Dot products are useful in a wide range of contexts. For example, given some set of values, denoted by a vector \(\mathbf{x} \in \mathbb{R}^d\) and a set of weights denoted by \(\mathbf{w} \in \mathbb{R}^d\), the weighted sum of the values in \(\mathbf{x}\) according to the weights \(\mathbf{w}\) could be expressed as the dot product \(\mathbf{x}^\top \mathbf{w}\). When the weights are non-negative and sum to one (i.e., \(\left(\sum_{i=1}^{d} {w_i} = 1\right)\)), the dot product expresses a weighted average. After normalizing two vectors to have the unit length, the dot products express the cosine of the angle between them. We will formally introduce this notion of length later in this section.
Matrix Multiplications
Concatentation and Stacking
Provide the list of tensors and axis to concatenate against.
1 2 3 |
|
Summation
Summing all the elements in the tensor yields a tensor with only one element. You can also sum along just a given axis.
1 2 |
|
Non-Reduction Sum
However, sometimes it can be useful to keep the number of axes unchanged when invoking the function for calculating the sum or mean.
1 2 3 |
|
Cumulative Sum
If we want to calculate the cumulative sum of elements of A along some axis, say axis=0 (row by row), we can call the cumsum function. This function will not reduce the input tensor along any axis.
1 |
|
Logical Operations
Sometimes, we want to construct a binary tensor via logical statements. Take X == Y as an example. For each position, if X and Y are equal at that position, the corresponding entry in the new tensor takes a value of 1, meaning that the logical statement X == Y is true at that position; otherwise that position takes 0.
1 |
|
Broadcasting
Under certain conditions, even when shapes differ, we can still perform elementwise operations by invoking the broadcasting mechanism. This mechanism works in the following way: First, expand one or both arrays by copying elements appropriately so that after this transformation, the two tensors have the same shape. Second, carry out the elementwise operations on the resulting arrays.
1 2 3 |
|
Since a
and b
are \(3\times1\) and \(1\times2\) matrices
respectively, their shapes do not match up if we want to add them. We
broadcast the entries of both matrices into a larger \(3\times2\)
matrix as follows: for matrix a
it replicates the columns and for
matrix b
it replicates the rows before adding up both elementwise.
1 |
|
Indexing an Slicing
Just as in any other Python array, elements in a tensor can be accessed by index. As in any Python array, the first element has index 0 and ranges are specified to include the first but before the last element. As in standard Python lists, we can access elements according to their relative position to the end of the list by using negative indices.
Thus, [-1] selects the last element and [1:3] selects the second and the third elements as follows:
1 |
|
Beyond reading, we can also write elements of a matrix by specifying indices.
1 2 3 4 5 |
|
Saving Memory
Running operations can cause new memory to be allocated to host results. For example, if we write Y = X + Y, we will dereference the tensor that Y used to point to and instead point Y at the newly allocated memory. In the following example, we demonstrate this with Python’s id() function, which gives us the exact address of the referenced object in memory. After running Y = Y + X, we will find that id(Y) points to a different location. That is because Python first evaluates Y + X, allocating new memory for the result and then makes Y point to this new location in memory.
1 2 3 |
|
This might be undesirable for two reasons. First, we do not want to run around allocating memory unnecessarily all the time. In machine learning, we might have hundreds of megabytes of parameters and update all of them multiple times per second. Typically, we will want to perform these updates in place. Second, we might point at the same parameters from multiple variables. If we do not update in place, other references will still point to the old memory location, making it possible for parts of our code to inadvertently reference stale parameters.
Fortunately, performing in-place operations is easy. We can assign the result of an operation to a previously allocated array with slice notation, e.g., Y[:] =
1 2 3 4 |
|