python – Is there a Numpy or pyTorch function for this code?

Basically is there a Numpy or PyTorch function that does this:

dims = vp_sa_s.size()
        for i in range(dims(0)):
            for j in range(dims(1)):
                for k in range(dims(2)):
                     #to mimic matlab functionality: vp(mdp_data.sa_s)
                    try:
                        
                        vp_sa_s(i,j,k) = vp(mdp_data('sa_s')(i,j,k))
                        
                    except:
                        print('didnt work with' , mdp_data('sa_s'))

Given that vp_sa_s is size (10,5,5) and each value is a valid index vp i.e in range 0-9. vp is size (10,1) with a bunch of random values.

Matlab do it elegantly and quickly with vp(mdp_data.sa_s) which will form a new (10,5,5) matrix. If all values in mdp_data.sa_s are 1, the result would be a (10,5,5) tensor with each value being the 1st value in vp.

Does a function or method that exists that can achieve this in less than O(N^3) time as the above code is terribly inefficient.

Thanks!

python – Is it only me or people are shifting from Tensorflow to Pytorch?

Surely Tensorflow runs the ML industries and Pytorch is, you can say, still used by researchers and students. But, recently I have started to stumbled upon Pytorch code more often then before. I guess people have started using Pytorch for development also and mastering it could give us, as an ML developer, an edge. But still these are my thoughts. I am seeking answer from experts who surely possess more insights in this.

Mask R-CNN optimizer and learning rate scheduler in Pytorch

In the Mask R-CNN paper the optimizer is described as follows training on MS COCO 2014/2015 dataset for instance segmentation (I believe this is the dataset, correct me if this is wrong)

We train on 8 GPUs (so effective minibatch
size is 16) for 160k iterations, with a learning rate of
0.02 which is decreased by 10 at the 120k iteration. We
use a weight decay of 0.0001 and momentum of 0.9. With
ResNeXt (45), we train with 1 image per GPU and the same
number of iterations, with a starting learning rate of 0.01.

I’m trying to write an optimizer and learning rate scheduler in Pytorch for a similar application, to match this description.

For the optimizer I have:

def get_Mask_RCNN_Optimizer(model, learning_rate=0.02):
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=0.0001)
    return optimizer

For the learning rate scheduler I have:

def get_MASK_RCNN_LR_Scheduler(optimizer, step_size):
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gammma=0.1, verbose=True)
    return scheduler

When the authors say “decreased by 10” do they mean divide by 10? Or do they literally mean subtract by 10, in which case we have a negative learning rate, which seems odd/wrong. Any insights appreciated.

How to use have batch norm not forget batch statistics it just used in Pytorch?

I am in an unusual setting where I should not use running statistics (as that would be considered cheating e.g. meta-learning). However, I often run a forward pass on a set of points (5 in fact) and then I want to evaluate only on 1 point using the previous statistics but batch norm forgets the batch statistics it just uses. I’ve tried to hard code the value it should be but I get strange errors (even when I uncomment things like from the pytorch code itself like checking the dimension size).

How do I hardcode the previous batch statistics so that batch norm works on a new single data point and then reset them for a fresh new next batch?

note: I don’t want to change the batch norm layer type.

Sample code I tried:

def set_tracking_running_stats(model):
    for attr in dir(model):
        if 'bn' in attr:
            target_attr = getattr(model, attr)
            target_attr.track_running_stats = True
            target_attr.running_mean = torch.nn.Parameter(torch.zeros(target_attr.num_features, requires_grad=False))
            target_attr.running_var = torch.nn.Parameter(torch.ones(target_attr.num_features, requires_grad=False))
            target_attr.num_batches_tracked = torch.nn.Parameter(torch.tensor(0, dtype=torch.long), requires_grad=False)
            # target_attr.reset_running_stats()
    return

my most comment errors:

    raise ValueError('expected 2D or 3D input (got {}D input)'
ValueError: expected 2D or 3D input (got 1D input)

and

IndexError: Dimension out of range (expected to be in range of (-1, 0), but got 1)

pytorch forum: https://discuss.pytorch.org/t/how-to-use-have-batch-norm-not-forget-batch-statistics-it-just-used/103437

c++ – Libtorch cannot detect the gpu,but pytorch successfully detects the gpu

Environment´╝Ü

win10 + vs2017;

python 3.7.7 + pytorch 1.4.0;

libtorch 1.7.0 + cuda 10.1

In python

print(torch.cuda.is_available()) //True

In libtorch

std::cout << torch::cuda::device_count() << std::endl;//0
std::cout << torch::cuda::is_available() << std::endl;//0
std::cout << torch::cuda::cudnn_is_available() << std::endl;//0
std::cout << torch::hasCUDA() << std::endl;//0

And In the

python – increase efficiency of loops and element-wise operations in PyTorch implementation

For any input matrix W, I have the following implementation in PyTorch. I was wondering if the following can be improved in terms of efficiency,

P.S. Would current implementation break backpropagation?

import torch

W = torch.tensor(((0,1,0,0,0,0,0,0,0),
                  (1,0,1,0,0,1,0,0,0),
                  (0,1,0,3,0,0,0,0,0),
                  (0,0,3,0,1,0,0,0,0),
                  (0,0,0,1,0,1,1,0,0),
                  (0,1,0,0,1,0,0,0,0),
                  (0,0,0,0,1,0,0,1,0),
                  (0,0,0,0,0,0,1,0,1),
                  (0,0,0,0,0,0,0,1,0)))

n = len(W)
C = torch.empty(n, n)
I = torch.eye(n)
for i in range(n):
    for j in range(n):
        B = W.clone()
        B(i, j) = 0
        B(j, i) = 0

        tmp = torch.inverse(n * I - B)

        C(i, j) = tmp(i, j)

python – “Same” padding for Conv2DTranspose in Pytorch

I’m trying to follow this tutorial to implement a Fully-Convolutional Network for semantic segmentation in Pytorch. For one layer, they had something similar to this piece of code

fcn9 = tf.layers.conv2d_transpose(fcn8, filters=layer4.get_shape().as_list()(-1),
kernel_size=4, strides=(2, 2), padding='SAME', name="fcn9")

for upsampling 2 times the image, I have been trying the same in Pytorch but it seems like Pytorch does not seem to have the “same” padding for Conv2DTranspose. For example, the fcn8 with image dimension (4,4) if running through nn.ConvTranspose2d() with similar input except the padding would output an image with size (10,10) instead of (8, 8).

Is there anyway that I could potentially overcome this?

performance – Specialized Normalization Running Slow in Pytorch

I have a tensor of shape z = (38, 38, 7, 7, 21) = (x_pos, y_pos, grid_i, grid_j, class_num), and I wish to normalize it according to the formula:
enter image description here

I have produced a working example of what I mean here, and the problem is that it is extremely slow, approximately 2-3 seconds for each grid entry (of which there are 49, so 49*3 seconds = 147 seconds, which is way too long, considering I need to do this with thousands of image feature maps).
Any optimizations or obvious problems very much appreciated. This is part of a Pytorch convolutional neural network architecture, so I am using torch tensors and tensor ops.

import torch
def normalizeScoreMap(score_map):
    for grid_i in range(7):
        for grid_j in range(7):
            for x in range(38):
                for y in range(38):
                    grid_sum = torch.tensor(0.0).cuda()
                    for class_num in range(21):
                        grid_sum += torch.pow(score_map(x)(y)(grid_i)(grid_j)(class_num), 2)
                    grid_normalizer = torch.sqrt(grid_sum)
                    for class_num in range(21):
                        score_map(x)(y)(grid_i)(grid_j)(class_num) /= grid_normalizer
    return score_map

random_score_map = torch.rand(38,38,7,7,21).cuda()
score_map = normalizeScoreMap(random_score_map)

Edit: For reference I have an i9-9900K CPU and a nvidia 2080 GPU, so my hardware is quite good. I would be willing to try multi-threading but I am looking for more obvious problems/optimizations.

numpy – Calculating determinants via Cholesky decomposition in PyTorch

I’ve been trying to calculate the determinant of a 2×2 matrix via Cholesky decomposition in PyTorch and it won’t give the same number as Numpy and I’m not sure why. From my understanding, you can calculate the determinant of a square positive-definite matrix via decomposing it into a lower triangular matrix and its transpose, i.e. M = LL^T.

Then by the law of determinants, the determinant of M is equal to the determinant of L multiplied by the determinant of L^T. Which, in the case of lower triangular matrices, is just the product of the diagonal. So, M would be equal to product of the diagonal of L multiplied by the product of the diagonal of L^T.

However, when I implement this in PyTorch, I get the wrong value. I’ve copied an example code below.

import torch
import numpy as np

matrix = torch.Tensor(2,2).uniform_()
print("Matrix: n",matrix.detach().numpy(),"n")

print("Positive-definite?: ",np.all(np.linalg.eigvals(matrix.detach().numpy()) > 0))
det_np = np.linalg.det(matrix.detach().numpy())


det_tor = torch.cholesky(matrix, upper=False).diag().prod()**2

print("determinant (numpy) %8.4f" % (det_np))
print("determinant (torch) %8.4f" % (det_tor))

An example output would be something like this,

Matrix: 
 ((0.5305128  0.2795679 )
 (0.41778737 0.40350497)) 

Positive-definite?:  True
determinant (numpy)   0.0973
determinant (torch)   0.0395

What is it that is wrong? Why is there a difference between these two methods?

Thanks in advance!