nn.Linear apperance inconsistency

🐛 Bug

nn.Linear apperance inconsistency when bias=True

To Reproduce

Steps to reproduce the behavior:

import torch
from torch import nn

torch.manual_seed(123)
model = nn.Linear(10,20,bias=True)
x= torch.rand(4,1,10)
print((model(x)[:,0] == model(x[:,0])).all())

model = nn.Linear(512,12800,bias=True)
x= torch.rand(4,1,512)
print((model(x)[:,0] == model(x[:,0])).all())


model = nn.Linear(512,12800,bias=False)
print((model(x)[:,0] == model(x[:,0])).all())

will given

tensor(True)
tensor(False)
tensor(True)

respectively

Expected behavior

all shuoud output True

Environment

PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 11.2.2 (x86_64)
GCC version: Could not collect
Clang version: 12.0.0 (clang-1200.0.32.29)
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.18.0
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @albanD @mruberry

1 possible answer(s) on “nn.Linear apperance inconsistency

  1. This is to be expected and is a result of floating point precision. Modifying the example a little bit,

    [ins] In [2]:
             ...: import torch
             ...: from torch import nn
             ...:
             ...: torch.manual_seed(123)
             ...: model = nn.Linear(10,20,bias=True)
             ...: x= torch.rand(4,1,10)
             ...: print((model(x)[:,0] == model(x[:,0])).all())
             ...:
             ...: model = nn.Linear(512,12800,bias=True)
             ...: x= torch.rand(4,1,512)
             ...: out1 = model(x)[:,0]
             ...: out2 = model(x[:,0])
    tensor(True)
    
    [ins] In [3]: (out1 - out2).abs().max()
    Out[3]: tensor(1.1921e-07, grad_fn=<MaxBackward1>)
    

    The two tensors only differ by 1e-07 and the magnitude of out1 and out2 at that point is roughly 1. We expect floating point operations to be accurate to around 6-7 decimal places so this looks fine.