pytorch – Enhancing performance using DataParallel

I have written the following code to practice parallelizing a PyTorch code on GPUs:

import math
import torch
import pickle
import time

import numpy as np
import torch.optim as optim

from torch import nn

print('device_count()', torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
    print('get_device_name', torch.cuda.get_device_name(i))

def _data(dimension, num_examples):
    num_mislabeled_examples = 20

    ground_truth_weights = np.random.normal(size=dimension) / math.sqrt(dimension)
    ground_truth_threshold = 0

    features = np.random.normal(size=(num_examples, dimension)).astype(
        np.float32) / math.sqrt(dimension)
    labels = (np.matmul(features, ground_truth_weights) >
    mislabeled_indices = np.random.choice(
        num_examples, num_mislabeled_examples, replace=False)
    labels(mislabeled_indices) = 1 - labels(mislabeled_indices)

    return torch.tensor(labels), torch.tensor(features)

class tools:
    def __init__(self): = 'x_2'

    def SomeFunc(self, model, input_):
        print(model.first_term(input_)(0))    # change to model.module.first_term when the flag is True

class predictor(nn.Module):
    def __init__(self, dim):
        super(predictor, self).__init__()
        self.weights = torch.nn.Parameter(torch.zeros(dim, 1, requires_grad=True))
        self.threshold = torch.nn.Parameter(torch.zeros(1, 1, requires_grad=True))

    def first_term(self, features):
        return features @ self.weights

    def forward(self, features):
        return self.first_term(features) - self.threshold

class HingeLoss(nn.Module):

    def __init__(self):
        super(HingeLoss, self).__init__()
        self.relu = nn.ReLU()

    def forward(self, output, target):
        all_ones = torch.ones_like(target)
        labels = 2 * target - all_ones
        losses = all_ones - torch.mul(output.squeeze(1), labels)

        return torch.norm(self.relu(losses))

class function(object):
    def __init__(self, epochs):

        dim = 10
        N = 100, self.features = _data(dim, N)

        self.epochs = epochs 
        self.model = predictor(dim).to('cuda')
        self.optimizer = optim.SGD(self.model.parameters(), lr=1e-3) ='cuda')
        self.features ='cuda')
        self.loss_function = HingeLoss().to('cuda') = tools()

    def train(self):

        for epoch in range(self.epochs):
            output = self.model(self.features)
            #, self.features)
            loss = self.loss_function(output,
            print('For epoch {}, loss is: {}.'.format(epoch, loss.item()))

def main():
    model = function(1000)
    if False: # This is Flag
        if torch.cuda.device_count() > 1:
            model.model = nn.DataParallel(model.model)
    t = time.time()
    print('elapsed: {}'.format(time.time() - t))

if __name__ == '__main__':

As far as I understand setting the flag to True should fix the thing, but my run time increases from 1 sec to 15 sec’s. I was wondering how to improve the performance.

Neural networks for image classification with PyTorch and images from the Oxford 102 flower data set. How can a usable data loader be created?

I am writing a bachelor's thesis and am creating a neural network with PyTorch and the flower data set Oxford 102, but cannot create a usable data loader for CNNs in Keras or Tensorflow. Is there someone who knows a way to do this or who can check my code? For some reason it doesn't work.

pytorch – How to remove loops and conditional instructions

Hello, I'm new to PyTorch. I was wondering how I can write this more efficiently in Pytorch. The for loops or the conditional statements z, v_pieces, w_pieces may only be removed by 1-d tensors.

def fit_dz(z,v_pieces,w_pieces):
   for i in range(len(z)):
      for j in range(len(w_pieces)-1):
         if 0 < z(i) < v_pieces(0):
            g_dz(i)= ((v_pieces(j)-0)/(w_pieces(j)-0))
        elif v_pieces(j) < z(i) < v_pieces(j+1):
            g_dz(i)= ((v_pieces(j+1)-v_pieces(j))/(w_pieces(j+1)-w_pieces(j)))
            return g_dz

python – loops in the PyTorch implementation

I am trying to implement a regularization term for the loss function of a neural network.

from torch import nn
import torch
import numpy as np

reg_sig = torch.randn((32, 9, 5))
reg_adj = torch.randn((32, 9, 9, 4))

Maug = reg_adj.shape(0)

n_node = 9
n_bond_features = 4
n_atom_features = 5

SM_f = nn.Softmax(dim=2)
SM_W = nn.Softmax(dim=3)

p_f = SM_f(reg_sig)
p_W = SM_W(reg_adj)

Sig = nn.Sigmoid()

q = 1 - p_f(:, :, 4)
A = 1 - p_W(:, :, :, 0)

A_0 = torch.eye(n_node)
A_0 = A_0.reshape((1, n_node, n_node))
A_i = A

B = A_0.repeat(reg_sig.size(0), 1, 1)

for i in range(1, n_node):
    A_i = Sig(100 * (torch.bmm(A_i, A) - 0.5))

    B += A_i

C = Sig(100 * (B - 0.5))

reg_g_ij = torch.randn((reg_sig.size(0), n_node, n_node))

for i in range(n_node):
    for j in range(n_node):
        reg_g_ij(:, i, j) = q(:, i) * q(:, j) * (1 - C(:, i, j)) + (1 - q(:, i) * q(:, j)) * C(:, i, j)

I believe that my implementation is not computationally efficient and would like to have suggestions as to which parts I can change. In particular, I want to get rid of the loops and, if possible, do them using matrix operations. Suggestions, working examples or links to useful burner functions are welcome

PyTorch Unit Test in Python – Code Review Stack Exchange

I'm new to PyTorch and write a component test for an activation feature that I do.

I plan to test this feature against a reference implementation. I want to do this in a test-driven way, so I've learned to write a test with a known good function: the ReLU implementation "MyReLU" from this beginner's tutorial.

The tests are passed, but can I improve the following code in some way? I am worried that I can not fully use the libraries and features of PyTorch.

import unittest
import numpy as np
import torch
from torch.autograd import gradcheck
from my_activation_functions import MyReLU

class ReluTest(unittest.TestCase):
    def setUp(self):
        self.relu = MyReLU.apply

    def test_relu_values_x_leqz(self):
        tin_leqz = torch.tensor(np.linspace(-10,0,300))
        tout_leqz = list(self.relu(tin_leqz))
        for x in tout_leqz:

    def test_relu_values_x0(self):
        tin_eqz = torch.tensor((0,0,0,0,0))
        tout_eqz = list(self.relu(tin_eqz))
        for x in tout_eqz:

    def test_relu_values_x_geqz(self):
        tin_geqz = torch.tensor(np.linspace(0.001,10,300))
        tout_geqz = list(self.relu(tin_geqz))
        test_geqz = list(tin_geqz)
        for ii in range(len(tout_geqz)):
            self.assertEqual(tout_geqz(ii), test_geqz(ii))

    def test_drelu_values(self):
        tin = (torch.randn(20,20,dtype=torch.double,requires_grad=True))
        self.assertTrue(gradcheck(self.relu, tin, eps=1e-6, atol=1e-4))

if __name__ == '__main__':

python – Pytorch Implementation of NN that is stuck to a loss value regardless of parameters

I implement one Simple torch network on titanic record to predict the user ID (as a problem of classifying multiple classes) from the record. I'm just running the experiment to check how the network works. To make the task easier for the network that I am Add the user IDs as a dummy feature to the input data,

Below is the code I wrote

lr = 100
class Network(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Network, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
        self.relu = torch.nn.ReLU()
        self.fc2 = torch.nn.Linear(self.hidden_size, advr_D_out)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        hidden = self.fc1(x)
        relu = self.sigmoid(hidden)
        output = self.fc2(relu)
        output = self.sigmoid(output)
        return output

network = Network(2577, 1024, 712)

criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam
net_optimizer = optimizer(network.parameters(), lr=lr)    

for epoch in range(20):
    y_pred = network(X)
    loss = criterion(y_pred, y_advr)

    print('Epoch {}: loss: {}'.format(

When running this code, my loss comes and gets stuck at around 0.69. I can not understand why exactly this is the exact number around which it is stuck.

Even if I change the value of the learning rate (lr), I see something that I can not explain (I've tried both SGD and Adam Optimizer). If the learning rate is low, the model slowly converges to 0.69, but if I give lr an arbitrarily large value, the model does not diverge, but theoretically it should.

I assume that I made a mistake in my code that I can not find. Any help would be helpful!


pytorch – Check if the expected object is a backend CUDA or a CPU.

  • I'm trying to run a code on both the CPU and CUDA.
  • The problem arises when creating objects because I need to know what to expect.

I need to determine if the computer expects a CUDA or CPU tensor before it is created.


def initilize (self, input):
self.x = torch.nn.Parameter (torch.zeros ((1, M))

def run (self, x, state):
B = ((self.x, h)

This gives:
Error: & # 39; Expected backend CUDA object, but backend CPU for argument # 1 & # 39;


def initilize (self, input):
device = torch.device ("cuda" if torch.cuda.is_available () else "cpu")
if (expecting_cuda == true):
self.x = torch.nn.parameter (torch.zeros ((1, M)) to (device))
self.x = torch.nn.Parameter (torch.zeros ((1, M))

def run (self, h):
B = ((self.x, h)
  • Question:
    How can you find out what the computer expects?

  • Limitations:
    I am working with a predefined "check" procedure, so I can not put an argument in the function "initilize". with information about CUDA or CPU.

anaconda – confused about the pytorch cuda version

I am somehow new in the coding Pytorch and CudaSo I prepared a good environment with Anaconda and tried out a few things from a few tutorials.

The point is, I'm still confused about the concept of the Cuda version that comes with it Pytorch Package like (eg Pytorch Cuda90) which points to cuda 9.0.

My question is, what does this number refer to? Is it the release of the GPU? because if I walk for example nvcc --version Command shows me that I have

Cuda Compilation Tools, Version 10.0

but what I installed in Anaconda was pytorch cuda90 and it works! although I'm not sure why I chose that?

Could someone please clarify?

Tensorflow – PPO: PyTorch implementation vs. TF

I work through the Spinning Up Course of openAI for RL. In order to consolidate the concepts, I try to rewrite the PPO-Algo in Pytorch.

My TF rewrite works really well while my Pytorch version is volatile (though more and more).

I isolated the delta to a specific part: apply the iterations of loss -> backprop -> gradients per trajectory used per update.

The question I'm trying to answer is whether the following two implementations are identical, assuming they both have identical calculations pi_loss and v_loss, The TF version works perfectly – the PyTorch with the same hyperparameters and identical model is everywhere.


# Slope of the guidelines
for _ in range (train_pi_iters):
_ = (train_pi, feed_dict = input)

# Gradient step
for _ in range (train_v_iters):
_ = (train_v, feed_dict = input)

# from where
train_pi = tf.train.AdamOptimizer (learning_rate = pi_lr) .minimize (pi_loss)
train_v = tf.train.AdamOptimizer (learning_rate = v_lr) .minimize (v_loss)

In PyTorch:

# Slope of the guidelines
for _ in range (train_pi_iters):

# ... loss statement here ... #

self.pi_op.zero_grad ()
pi_loss.backward ()
self.pi_op.step ()

# Value function learning
for _ in range (train_v_iters):

# ... loss statement here ... #

self.c_op.zero_grad ()
v_loss.backward ()
self.c_op.step ()

Suppose we have the same steps pi_loss and v_lossare they the same above?

Alternatively, we can place PyTorch's zero_grad () & Step() outside the loop, continually adding the gradients and performing a single batch update, e.g.

self.pi_op.zero_grad ()
self.c_op.zero_grad ()

# Slope of the guidelines
for i in range (train_pi_iters):

# ... loss statement here ... #

pi_loss.backward ()

# Value function learning
for _ in range (train_v_iters):

# ... loss statement here ... #

v_loss.backward ()

self.pi_op.step ()
self.c_op.step ()

Which option is more akin to what TF does for everyone (train_x)?

Note: In the actual implementation, I use early stopping using KL divergence approximation, but calc is identical between them

Many Thanks!!