Efficient eigenvalue computation for Hessian of neural networks

I train a neural network – one of the Resnet variations ($approx 10^7$ parameters) on the CIFAR-10 dataset – and after each epoch, I would like to find the smallest/largest eigenvalues of its Hessian. For that, I can use hessian-vector products (i.e. $f(v) = H v$, where $H$ is the Hessian corresponding to the batch I’m currently using, PyTorch has a built-in mechanism for that), so, for example, I can use the power method.

Question: do you know of an efficient algorithm for this task? To be precise, I mean that both eigenvalues can be computed with a reasonable multiplicative error within at most 10 minutes.

Note that I’m asking about an algorithm that you know to be efficient for this (or similar) problem. I tried the power method, accelerated power method, Oja’s algorithm, gradient-based algorithm, its accelerated version, algorithms from https://arxiv.org/abs/1707.02670. All these experiments take a lot of time and so far I didn’t have any success, no matter how much engineering I used. When eigenvalues are close to $0$ (e.g. of order $-frac 12$), either convergence takes a lot of time or the results are unstable/unreliable.

neural networks – Binary cross Entropy derivative?

Here is the definition of cross-entropy for Bernoulli random variables $operatorname{Ber}(p),operatorname{Ber}(q)$, taken from Wikipedia:
$$
H(p,q) = p log frac{1}{q} + (1-p) log frac{1}{1-q}.
$$

This is exactly what your first function computes.

The partial derivative of this function with respect to $p$ is
$$
frac{partial H(p,q)}{partial p} = log frac{1}{q} – log frac{1}{1-q} = log frac{1-q}{q}.
$$

The partial derivative of this function with respect to $q$ is
$$
frac{partial H(p,q)}{partial q} = -frac{p}{q} + frac{1-p}{1-q} = frac{(1-p)q-p(1-q)}{q(1-q)} = frac{q-p}{q(1-q)}.
$$

This is exactly what your third function computes.

I’m not sure what your second function computes. Also, there is no reason to expect that the partial derivative with respect to one variable will be the same as the partial derivative with respect to another variable.

machine learning – Designing a neural network with LSTM and feedforward NN combination

Currently, I’m designing a neural network that works with reinforcement learning. In summary, the agent takes in information about itself and nearby agents and, in conjunction with global world information, makes a decision.

I’m currently thinking of implementing this as a LSTM to take in information about itself and a variable number of nearby agents and a feedforward neural network to combine the information from the LSTM output and global world information to produce an action.

Would this approach be sufficient to produce meaningful results? I thought that another approach would be to take in the global world information and each agent at each LSTM cell, though it may use much more resources (resources during forward propagation are a main concern with this project). Also, if the second approach is used, how would I be able to link the inputs to outputs if they had different shapes (attempting to learn without a library)? How would I be able to map an input with shape (1, x, 6) to (1, 1, 4) or (1, 4).

neural networks – Cant understand that is an activation function

Okay i have been trying to learn neural networks from scratch , in python , so first ,wherever they talk about neural networks , they never mention what do you mean by activation of a neuron , i dont get this and need to understand the basic meaning of activation , is it like activated – non activated (come kind of boolean) and if we see the definition of bias it is that it helps to compensate the activation of the neuron. and y do we need an activation function at all , what is it doing im not able to understand

Neural DSP Parallax v1.0.0 WiN

{
“lightbox_close”: “Close”,
“lightbox_next”: “Next”,
“lightbox_previous”: “Previous”,
“lightbox_error”: “The requested content cannot be loaded. Please try again later.”,
“lightbox_start_slideshow”: “Start slideshow”,
“lightbox_stop_slideshow”: “Stop slideshow”,
“lightbox_full_screen”: “Full screen”,
“lightbox_thumbnails”: “Thumbnails”,
“lightbox_download”: “Download”,
“lightbox_share”: “Share”,
“lightbox_zoom”: “Zoom”,
“lightbox_new_window”: “New window”,
“lightbox_toggle_sidebar”: “Toggle sidebar”
}

Neural DSP Parallax v1.0.0 WiN | 12 Mb
Parallel bass processing has been used for decades. Dual rigs or multiple plugins would be configured to distort treble for clarity and aggression, and compress lows for a massive foundation. Encompassing over a decade of experience engineering some of the most devastating bass distortions on the planet, Parallax provides everything you need to design the…

Read more

.

Build a neural network in python without using numpy but only the math library

So, after I am processing the data here below:

# Load a CSV file
afrom csv import reader

def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset


# Convert string values to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row(column) = float(row(column).strip())


filename = 'gameIS.csv'
dataset = load_csv(filename)
for i in range(len(dataset(0))):
    str_column_to_float(dataset, i)


# Find the min and max values for each column

def dataset_minmax(dataset):
    minmax = list()
    stats = ((min(column), max(column)) for column in zip(*dataset))
    return stats


minmax = dataset_minmax(dataset)


# Rescale the dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row) - 1):
            row(i) = (row(i) - minmax(i)(0)) / (minmax(i)(1) - minmax(i)(0))
    # return dataset


# print(minmax)
normalize_dataset(dataset, minmax)
# dataset
if __name__ == '__main':
    load_csv('gameIS.csv')
I try to build a neural network without using numpy here below:


from random import random
from math import exp
import matplotlib.pyplot as plt


class Neuron(object):

    def __init__(self):
        self.network = None

    def initialize_network(self, n_inputs, n_hidden, n_outputs):
        """
        Initializes a network by taking the number of inputs, the number of neurons to have in the hidden layer and
        the number of outputs in output layer
        Network is a list a of two layers(lists): Hidden layer and Output layer
        Each layer has neruons, where each neuron is a dictionary of weights and biases
        """
        self.network = list()
        hidden_layer = ({'weights': (random() for i in range(n_inputs + 1))} for i in range(n_hidden))
        self.network.append(hidden_layer)
        output_layer = ({'weights': (random() for i in range(n_hidden + 1))} for i in range(n_outputs))
        self.network.append(output_layer)
        return self.network

    # Calculate neuron activation for an input
    def activateself(self, weights, inputs):
        activation = weights(-1)  # weights(-1) = Bias
        for i in range(len(weights) - 1):
            activation += weights(i) * inputs(i)
        return activation

    # Transfer neuron activation
    def transfer(self, activation):
        return 1.0 / (1.0 + exp(-activation))

    # Forward propogate input to a network output
    def forward_propagate(self, row):
        inputs = row
        for layer in self.network:
            new_inputs = ()
            # print("Output of neurons: ")
            for neuron in layer:
                activation = self.activate(neuron('weights'), inputs)
                neuron('output') = self.transfer(activation)
                # print(neuron('output'))
                new_inputs.append(neuron('output'))
            inputs = new_inputs
        return inputs

    # Calculates the derivative of an neuron output
    def transfer_dervative(self, output):
        lamda = 0.8
        return output * (1.0 - output) * lamda

    def backward_propogate_error(self, expected):
        for i in reversed(range(len(self.network))):
            layer = self.network(i)
            # print(layer)
            # print("Error terms at layers: ")
            errors = list()
            if i != len(self.network) - 1:
                for j in range(len(layer)):
                    error = 0.0
                    for neuron in self.network(i + 1):
                        error += neuron('weights')(j) * neuron('delta')
                        # error = (weight_k * error_j) * transfer_derivative(output)
                        # Where error_j is the error signal from the jth neuron in the output layer,
                        # weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.
                    errors.append(error)
                    # print(errors)
            else:
                for j in range(len(layer)):
                    neuron = layer(j)
                    errors.append(expected(j) - neuron('output'))
                # print(errors)
            for j in range(len(layer)):
                neuron = layer(j)
                neuron('delta') = errors(j) * self.transfer_dervative(neuron('output'))
                # error_at_ouput_layer = (expected - output of layer) * transfer_derivative(output)

    # Update network weights with error
    def update_weights(self, row, l_rate):
        for i in range(len(self.network)):
            inputs = row(:-2)
            # print(inputs)
            if i != 0:
                inputs = (neuron('output') for neuron in self.network(i - 1))  # Output layer input = output of hiddenlayer
                # print(inputs)
            for neuron in self.network(i):
                # print(neuron)
                for j in range(len(inputs)):
                    # weight = weight + learning_rate * error * input
                    neuron('weights')(j) += l_rate * neuron('delta') * inputs(j)
                neuron('weights')(-1) += l_rate * neuron('delta')  # updating bias
                # print(neuron('weights')(-1) )

    def train_network(self, train, l_rate, n_epochs, n_outputs):
        print(len(train))
        for epoch in range(n_epochs):
            Total_error = 0
            for row in train:
                sum_error_row = 0
                outputs = self.forward_propogate(row(:2))
                # print("Predicted:",outputs)
                expected = row(-2:)
                # print("expected:",expected)
                sum_error_row += sum(((expected(i) - outputs(i)) ** 2 for i in range(len(expected))))
                # print("sum_row:",sum_error_row)
                self.backward_propogate_error(expected)
                self.update_weights(row, l_rate)
                Total_error += sum_error_row
            # print(Total_error)
              print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate,((Total_error)/len(train))))

if __name__ == '__main__':
    neural_net = Neuron()
    neural_net.initialize_network(2, 4, 2)
    neural_net.forward_propagate((1,1))

The problem is I am getting the following error:

C:Python3python.exe M:/neural_net/network.py
Traceback (most recent call last):
File "M:/neural_net/network.py", line 120, in <module>
neural_net.forward_propagate((1,1))
File "M:/neural_net/network.py", line 44, in forward_propagate
activation = self.activate(neuron('weights'), inputs)
AttributeError: 'Neuron' object has no attribute 'activate'
 

The error refers to the activation function which multiplies the weights with the inputs.
Does anyone know how to remedy this particular error in the code?

machine learning – Neural Networks

Define a neural network (with all edge weights) that outputs +1 inside the triangle
defined by points (1, 2),(4, −1) and (1, −1), and −1 outside this triangle.

Approach: Define three hyperplanes for the edges of the triangle. That gives you 3 nodes on the hidden layer. The output layer simply takes the logical AND of the hidden layer by summing.
Is this correct? Can somebody elaborate on how to exactly solve this?

mathematical optimization – How to train each layer in a Neural Network so they optimize different loss functions in an adversarial network?

Example NetGraph to illustrate the idea (Input is an online signal with a value between -1 and 1, Noise is Gaussian Centered at 0 and standard deviation 0.1, EvilNet is constrained to output a value between Input-0.2 and Input+0.2):
Example NetGraph

I am trying to train “GoodNet” in the network in a way that minimizes “GoodLoss”, while training “EvilNet” in a way that minimizes “EvilLoss” (maximizing “GoodLoss”).

I have tried using TrainingUpdateSchedule, but I don’t find a way to indicate which Loss Function should be minimized at each step.

I have also tried alternating calls to NetTrain with different parameters (freezing GoodNet and minimizing EvilLoss, and then freezing EvilNet and minimizing GoodLoss), but the process is very inefficient, since the momenta for the ADAM method are reset each time NetTrain is called, and there is also significant overhead.

python – Implementing Convolutional Neural Network

Context

I was making a Convolutional Neural Network from scratch in Python. I completed making it …. It works fine … The only thing is that it takes a lot of time as the size of the input grows.

Code

import numpy as np
import math

class ConvolutionalNeuralNetwork():
    def __init__(self, num_of_filters, kernel_shape, stride):
        self.num_of_filters = num_of_filters
        self.kernel_shape = kernel_shape
        self.stride = stride
        self.kernels = ()

        # Initialize 
        for i in range(self.num_of_filters):
            self.kernel = np.random.uniform(-1, 1, size=(3,3))
            self.kernels.append(self.kernel)
        self.kernels = np.array(self.kernels)

    def ElementWiseAddition(self, images):
        if np.array(images).shape(0) == 1:
            return images(0)
        
        resultant_image = images(0)
        for image in images(1:):
            resultant_image = np.add(image, resultant_image)
            resultant_image = resultant_image.astype(float)
            resultant_image /= 2.0

        return resultant_image

    def GetOutput(self, x):
        filter_maps = ()
        for filter_n in range(self.num_of_filters):
            kernel_n_filter_maps = ()
            for image in x:
                filter_map = ()
                for i in range(0, (image.shape(0)-3)+1, self.stride):
                    row = ()
                    for j in range(0, (image.shape(1)-3)+1, self.stride):
                        piece = image(i:i+3, j:j+3)
                        value = np.sum(np.multiply(self.kernels(filter_n), piece))

                        # Apply Softmax Activation
                        if value < 0.0:
                            value = 0
                        row.append(value)
                    filter_map.append(row)
                kernel_n_filter_maps.append(filter_map)
            filter_maps.append(self.ElementWiseAddition(kernel_n_filter_maps))
        return np.array(filter_maps)

input = np.random.uniform(-1, 1, size=(512, 4, 4))

ConvolutionalNN = ConvolutionalNeuralNetwork(1028, (3,3), stride=1)
output = ConvolutionalNN.GetOutput(input)
print(output.shape)

How can I make this code consume less time and make it more efficient?