'2.0.0'
Lecture 22
PyTorch is a Python package that provides two high-level features:
- Tensor computation (like NumPy) with strong GPU acceleration
- Deep neural networks built on a tape-based autograd system
are the basic data abstraction in PyTorch and are implemented by the torch.Tensor
class. The behave in much the same was as the other array libraries we’ve seen so far (numpy
, theano
, etc.)
As expected, tensors can be constructed from constant numeric values in lists or tuples.
Data type | dtype |
type() |
Comment |
---|---|---|---|
32-bit float | float32 or float |
FloatTensor |
Default float |
64-bit float | float64 or double |
DoubleTensor |
|
16-bit float | float16 or half |
HalfTensor |
|
16-bit brain float | bfloat16 |
BFloat16Tensor |
|
64-bit complex float | complex64 |
||
128-bit complex float | complex128 or cdouble |
||
8-bit integer (unsigned) | uint8 |
ByteTensor |
|
8-bit integer (signed) | int8 |
CharTensor |
|
16-bit integer (signed) | int16 or short |
ShortTensor |
|
32-bit integer (signed) | int32 or int |
IntTensor |
|
64-bit integer (signed) | int64 or long |
LongTensor |
Default integer |
Boolean | bool |
BoolTensor |
Just like NumPy and Pandas, types are specified via the dtype
argument and can be inspected via the dtype attribute.
When using types with less precision it is important to be careful about underflow and overflow (ints) and rounding errors (floats).
Error: RuntimeError: value cannot be converted to type int8 without overflow
tensor([-128], dtype=torch.int8)
tensor([255], dtype=torch.uint8)
tensor([44], dtype=torch.uint8)
tensor([300], dtype=torch.int16)
It is possible to easily move between NumPy arrays and Tensors via the from_numpy()
function and numpy()
method.
tensor([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]], dtype=torch.float64)
tensor([1, 2, 3])
array([[0.28367, 0.65673, 0.23876],
[0.73128, 0.60122, 0.30433]], dtype=float32)
array([[1, 1],
[1, 1]])
Just like NumPy torch tensor
objects support basic mathematical and logical operations with scalars and other tensors - torch provides implementations of most commonly needed mathematical functions.
Like NumPy in cases where tensor dimensions do not match, the broadcasting algorithm is used. The rules for broadcasting are:
Each tensor must have at least one dimension - no empty tensors.
Comparing the dimension sizes of the two tensors, going from last to first:
Each dimension must be equal, or
One of the dimensions must be of size 1, or
The dimension does not exist in one of the tensors
Consider the following 6 tensors:
which of the above could be multiplied together and produce a valid result via broadcasting (e.g. a*b
, a*c
, a*d
, etc.).
Explain why or why not broadcasting was able to be applied in each case.
In instances where we need to conserve memory it is possible to apply many functions such that a new tensor is not created but the original value(s) are replaced. These functions share the same name with the original functions but have a _
suffix.
All arithmetic functions are available as methods of the Tensor class,
The shape
of a tensor can be changed using the view()
or reshape()
methods. The former guarantees that the result shares data with the original object (but requires contiguity),the latter may or may not copy the data.
Error: RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
tensor([[1., 1.],
[1., 1.],
[1., 1.]])
tensor([[1., 1., 1.],
[1., 1., 1.]])
tensor([0., 0., 0., 0., 0., 0.])
The squeeze()
and unsqueeze()
methods can be used to remove or add length 1 dimension(s) to a tensor.
Given the following tensors,
what reshaping is needed to make it possible so that a * b
and a * c
can be calculated via broadcasting?
Gradient tracking can be enabled using the requires_grad
argument at initialization, alternatively the requires_grad
flag can be set on the tensor or the enable_grad()
context manager used (via with
).
tensor([0.00000000, 0.10000000, 0.20000000, 0.30000001, 0.40000001, 0.50000000,
0.60000002, 0.69999999, 0.80000001, 0.90000004, 1.00000000, 1.09999990,
1.20000005, 1.29999995, 1.39999998, 1.50000000, 1.60000002, 1.70000005,
1.79999995, 1.89999998, 2.00000000], requires_grad=True)
Basics of the computation graph can be explored via the next_functions
attribute
In order to calculate the gradients we use the backward()
method on the output tensor (must be a scalar), this then makes the grad attribute available for the input (leaf) tensors.
<string>:1: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad
attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a
non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure
you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered
internally at /Users/runner/work/pytorch/pytorch/pytorch/build/aten/src/ATen/core/TensorBody.h:491.)
In context you can interpret x.grad
and m.grad
as the gradient of y
with respect to x
or m
respectively.
This allows for the automatic calculation and evaluation of the jacobian and hessian for a function defined using tensors.
for x in [0.,1.]:
for y in [0.,1.]:
print("x =",x, "y = ",y)
inputs = (torch.tensor([x]), torch.tensor([y]))
print(torch.autograd.functional.jacobian(f, inputs),"\n")
x = 0.0 y = 0.0
(tensor([[3.]]), tensor([[0.]]))
x = 0.0 y = 1.0
(tensor([[4.]]), tensor([[4.]]))
x = 1.0 y = 0.0
(tensor([[3.]]), tensor([[1.]]))
x = 1.0 y = 1.0
(tensor([[4.]]), tensor([[5.]]))
((tensor([[0.]]), tensor([[1.]])), (tensor([[1.]]), tensor([[4.]])))
((tensor([[0.]]), tensor([[1.]])), (tensor([[1.]]), tensor([[4.]])))
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.990
Model: OLS Adj. R-squared: 0.989
Method: Least Squares F-statistic: 1455.
Date: Fri, 07 Apr 2023 Prob (F-statistic): 1.44e-45
Time: 12:43:41 Log-Likelihood: 60.967
No. Observations: 50 AIC: -113.9
Df Residuals: 46 BIC: -106.3
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 3.161e-16 0.016 2e-14 1.000 -0.032 0.032
x 0.8476 0.014 59.444 0.000 0.819 0.876
I(x ** 2) -4.949e-17 0.003 -1.44e-14 1.000 -0.007 0.007
I(x ** 3) -0.0912 0.002 -42.977 0.000 -0.095 -0.087
==============================================================================
Omnibus: 3.322 Durbin-Watson: 0.177
Prob(Omnibus): 0.190 Jarque-Bera (JB): 1.647
Skew: 0.000 Prob(JB): 0.439
Kurtosis: 2.111 Cond. No. 19.1
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
tensor([-1.22464680e-16, -1.27877162e-01, -2.53654584e-01, -3.75267005e-01,
-4.90717552e-01, -5.98110530e-01, -6.95682551e-01, -7.81831482e-01,
-8.55142763e-01, -9.14412623e-01, -9.58667853e-01, -9.87181783e-01,
-9.99486216e-01, -9.95379113e-01, -9.74927912e-01, -9.38468422e-01,
-8.86599306e-01, -8.20172255e-01, -7.40277997e-01, -6.48228395e-01,
-5.45534901e-01, -4.33883739e-01, -3.15108218e-01, -1.91158629e-01,
-6.40702200e-02, 6.40702200e-02, 1.91158629e-01, 3.15108218e-01,
4.33883739e-01, 5.45534901e-01, 6.48228395e-01, 7.40277997e-01,
8.20172255e-01, 8.86599306e-01, 9.38468422e-01, 9.74927912e-01,
9.95379113e-01, 9.99486216e-01, 9.87181783e-01, 9.58667853e-01,
9.14412623e-01, 8.55142763e-01, 7.81831482e-01, 6.95682551e-01,
5.98110530e-01, 4.90717552e-01, 3.75267005e-01, 2.53654584e-01,
1.27877162e-01, 1.22464680e-16], dtype=torch.float64)
tensor([[ 1.00000000e+00, -3.14159265e+00, 9.86960440e+00, -3.10062767e+01],
[ 1.00000000e+00, -3.01336438e+00, 9.08036490e+00, -2.73624482e+01],
[ 1.00000000e+00, -2.88513611e+00, 8.32401038e+00, -2.40159029e+01],
[ 1.00000000e+00, -2.75690784e+00, 7.60054083e+00, -2.09539906e+01],
[ 1.00000000e+00, -2.62867957e+00, 6.90995627e+00, -1.81640609e+01],
[ 1.00000000e+00, -2.50045130e+00, 6.25225668e+00, -1.56334633e+01],
[ 1.00000000e+00, -2.37222302e+00, 5.62744208e+00, -1.33495477e+01],
[ 1.00000000e+00, -2.24399475e+00, 5.03551245e+00, -1.12996635e+01],
[ 1.00000000e+00, -2.11576648e+00, 4.47646780e+00, -9.47116053e+00],
[ 1.00000000e+00, -1.98753821e+00, 3.95030813e+00, -7.85138836e+00],
[ 1.00000000e+00, -1.85930994e+00, 3.45703344e+00, -6.42769664e+00],
[ 1.00000000e+00, -1.73108167e+00, 2.99664374e+00, -5.18743503e+00],
[ 1.00000000e+00, -1.60285339e+00, 2.56913900e+00, -4.11795318e+00],
[ 1.00000000e+00, -1.47462512e+00, 2.17451925e+00, -3.20660072e+00],
[ 1.00000000e+00, -1.34639685e+00, 1.81278448e+00, -2.44072732e+00],
[ 1.00000000e+00, -1.21816858e+00, 1.48393469e+00, -1.80768261e+00],
[ 1.00000000e+00, -1.08994031e+00, 1.18796988e+00, -1.29481625e+00],
[ 1.00000000e+00, -9.61712037e-01, 9.24890042e-01, -8.89477886e-01],
[ 1.00000000e+00, -8.33483765e-01, 6.94695187e-01, -5.79017160e-01],
[ 1.00000000e+00, -7.05255494e-01, 4.97385311e-01, -3.50783723e-01],
[ 1.00000000e+00, -5.77027222e-01, 3.32960415e-01, -1.92127223e-01],
[ 1.00000000e+00, -4.48798951e-01, 2.01420498e-01, -9.03973081e-02],
[ 1.00000000e+00, -3.20570679e-01, 1.02765560e-01, -3.29436254e-02],
[ 1.00000000e+00, -1.92342407e-01, 3.69956017e-02, -7.11582309e-03],
[ 1.00000000e+00, -6.41141358e-02, 4.11062241e-03, -2.63549003e-04],
[ 1.00000000e+00, 6.41141358e-02, 4.11062241e-03, 2.63549003e-04],
[ 1.00000000e+00, 1.92342407e-01, 3.69956017e-02, 7.11582309e-03],
[ 1.00000000e+00, 3.20570679e-01, 1.02765560e-01, 3.29436254e-02],
[ 1.00000000e+00, 4.48798951e-01, 2.01420498e-01, 9.03973081e-02],
[ 1.00000000e+00, 5.77027222e-01, 3.32960415e-01, 1.92127223e-01],
[ 1.00000000e+00, 7.05255494e-01, 4.97385311e-01, 3.50783723e-01],
[ 1.00000000e+00, 8.33483765e-01, 6.94695187e-01, 5.79017160e-01],
[ 1.00000000e+00, 9.61712037e-01, 9.24890042e-01, 8.89477886e-01],
[ 1.00000000e+00, 1.08994031e+00, 1.18796988e+00, 1.29481625e+00],
[ 1.00000000e+00, 1.21816858e+00, 1.48393469e+00, 1.80768261e+00],
[ 1.00000000e+00, 1.34639685e+00, 1.81278448e+00, 2.44072732e+00],
[ 1.00000000e+00, 1.47462512e+00, 2.17451925e+00, 3.20660072e+00],
[ 1.00000000e+00, 1.60285339e+00, 2.56913900e+00, 4.11795318e+00],
[ 1.00000000e+00, 1.73108167e+00, 2.99664374e+00, 5.18743503e+00],
[ 1.00000000e+00, 1.85930994e+00, 3.45703344e+00, 6.42769664e+00],
[ 1.00000000e+00, 1.98753821e+00, 3.95030813e+00, 7.85138836e+00],
[ 1.00000000e+00, 2.11576648e+00, 4.47646780e+00, 9.47116053e+00],
[ 1.00000000e+00, 2.24399475e+00, 5.03551245e+00, 1.12996635e+01],
[ 1.00000000e+00, 2.37222302e+00, 5.62744208e+00, 1.33495477e+01],
[ 1.00000000e+00, 2.50045130e+00, 6.25225668e+00, 1.56334633e+01],
[ 1.00000000e+00, 2.62867957e+00, 6.90995627e+00, 1.81640609e+01],
[ 1.00000000e+00, 2.75690784e+00, 7.60054083e+00, 2.09539906e+01],
[ 1.00000000e+00, 2.88513611e+00, 8.32401038e+00, 2.40159029e+01],
[ 1.00000000e+00, 3.01336438e+00, 9.08036490e+00, 2.73624482e+01],
[ 1.00000000e+00, 3.14159265e+00, 9.86960440e+00, 3.10062767e+01]],
dtype=torch.float64)
Going back to our discussion of optimization and gradient descent awhile back - we can update our guess for b
/ bt
by moving in the direction of the negative gradient. The step size is refered to as the learning rate which we will pick a relatively small value for.
yt = torch.tensor(y).unsqueeze(1)
Xt = torch.tensor(lm.model.exog)
bt = torch.randn((Xt.shape[1], 1), dtype=torch.float64, requires_grad=True)
learning_rate = 1e-5
for i in range(5000):
yt_pred = Xt @ bt
loss = (yt_pred - yt).pow(2).sum()
if i % 500 == 0:
print(f"Step: {i},\tloss: {loss.item()}")
loss.backward()
with torch.no_grad():
bt -= learning_rate * bt.grad
bt.grad = None
Step: 0, loss: 70161.1580804254
Step: 500, loss: 14.791178300540242
Step: 1000, loss: 8.825181658035252
Step: 1500, loss: 5.311942717260375
Step: 2000, loss: 3.2416251317783518
Step: 2500, loss: 2.020671792951764
Step: 3000, loss: 1.3000220383569292
Step: 3500, loss: 0.8742816442183534
Step: 4000, loss: 0.6225166364100523
Step: 4500, loss: 0.473473387453477
tensor([[ 0.03143311],
[ 0.78484316],
[-0.00520945],
[-0.08260584]], dtype=torch.float64, requires_grad=True)
class Model(torch.nn.Module):
def __init__(self, beta):
super().__init__()
beta.requires_grad = True
self.beta = torch.nn.Parameter(beta)
def forward(self, X):
return X @ self.beta
def training_loop(model, X, y, optimizer, n=1000):
losses = []
for i in range(n):
y_pred = model(X)
loss = (y_pred.squeeze() - y.squeeze()).pow(2).sum()
loss.backward()
optimizer.step()
optimizer.zero_grad()
losses.append(loss.item())
return losses
class Model(torch.nn.Module):
def __init__(self, X, y, beta=None):
super().__init__()
self.X = X
self.y = y
if beta is None:
beta = torch.zeros(X.shape[1])
beta.requires_grad = True
self.beta = torch.nn.Parameter(beta)
def forward(self, X):
return X @ self.beta
def fit(self, opt, n=1000, loss_fn = torch.nn.MSELoss()):
losses = []
for i in range(n):
loss = loss_fn(self(self.X).squeeze(), self.y.squeeze())
loss.backward()
opt.step()
opt.zero_grad()
losses.append(loss.item())
return losses
Sta 663 - Spring 2023