Image Hero
A Gentle, Minimalist introduction to Machine Learning
Vote Up Down


Vote down!
Tag Contrib

Hello everybody! Recently, I’ve been spending non-trivial amounts of time on the fascinating subject of artificial intelligence. It’s come a long way! With the release of Midjourney and ChatGPT, among other products, 2023 looks to be extremely promising, even revolutionary.

I’d like to recommend the following tutorial:

It is simple, sufficiently detailed, does not use tensor flow, and produces a picture at the end!

The complete code to run the example is reproduced below:

import matplotlib.pyplot as plt
import numpy as np

class NeuralNetwork:
    def __init__(self, learning_rate):
        self.weights = np.array([np.random.randn(), np.random.randn()])
        self.bias = np.random.randn()
        self.learning_rate = learning_rate

    def _sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def _sigmoid_deriv(self, x):
        return self._sigmoid(x) * (1 - self._sigmoid(x))

    def predict(self, input_vector):
        layer_1 =, self.weights) + self.bias
        layer_2 = self._sigmoid(layer_1)
        prediction = layer_2
        return prediction

    def _compute_gradients(self, input_vector, target):
        layer_1 =, self.weights) + self.bias
        layer_2 = self._sigmoid(layer_1)
        prediction = layer_2

        derror_dprediction = 2 * (prediction - target)
        dprediction_dlayer1 = self._sigmoid_deriv(layer_1)
        dlayer1_dbias = 1
        dlayer1_dweights = (0 * self.weights) + (1 * input_vector)

        derror_dbias = (
            derror_dprediction * dprediction_dlayer1 * dlayer1_dbias
        derror_dweights = (
            derror_dprediction * dprediction_dlayer1 * dlayer1_dweights

        return derror_dbias, derror_dweights

    def _update_parameters(self, derror_dbias, derror_dweights):
        self.bias = self.bias - (derror_dbias * self.learning_rate)
        self.weights = self.weights - (
            derror_dweights * self.learning_rate

    def train(self, input_vectors, targets, iterations):
        cumulative_errors = []
        for current_iteration in range(iterations):
            # Pick a data instance at random
            random_data_index = np.random.randint(len(input_vectors))

            input_vector = input_vectors[random_data_index]
            target = targets[random_data_index]

            # Compute the gradients and update the weights
            derror_dbias, derror_dweights = self._compute_gradients(
                input_vector, target

            self._update_parameters(derror_dbias, derror_dweights)

            # Measure the cumulative error for all the instances
            if current_iteration % 100 == 0:
                cumulative_error = 0
                # Loop through all the instances to measure the error
                for data_instance_index in range(len(input_vectors)):
                    data_point = input_vectors[data_instance_index]
                    target = targets[data_instance_index]

                    prediction = self.predict(data_point)
                    error = np.square(prediction - target)

                    cumulative_error = cumulative_error + error

        return cumulative_errors

input_vectors = np.array(
    [3, 1.5],
    [2, 1],
    [4, 1.5],
    [3, 4],
    [3.5, 0.5],
    [2, 0.5],
    [5.5, 1],
    [1, 1],

targets = np.array([0, 1, 0, 1, 0, 1, 1, 0])

learning_rate = 0.01
neural_network = NeuralNetwork(learning_rate)

training_error = neural_network.train(input_vectors, targets, 1000)

plt.ylabel("Error for all training instances")

And I would like to add some commentary of my own to this great tutorial.

First, the author writes the resulting error after training doesn’t decrease because the dataset is tiny, only 8 data points:

But of course, an astute student would note that by decreasing the learning rate, and increasing the number of learning iterations, we can slightly reduce the error. Or if not reduce the error, at least reduce the variance of the error. The following is the plot of error after decreasing learning rate by 10-fold, and increasing iterations by 3-fold:

If we zoom in, the original error looks like this, where smaller is better:

So you can see the effects of reducing learning rate on the error.

My second commentary is: what does all of this mean? Let’s plot the input data:

Red are vectors that should be categorized as “0”, green are categorized as “1”. Blue is the vector representing the learned weights of the network (there are only two, so I plot them as x,y).

Humans are great at pattern recognition. Just looking at the plot, you can see that the best (if overfit) predictor for this data would be a vector pointing to the average of red arrows, and an activation function to specify the radius around the average point, to define the red cluster.

Of course, the advantage of a neural network is that it is capable of classifying (and performing other operations) on much more complex data, where plotting inputs would perhaps be impossible. Nevertheless, for an introductory tutorial, I believe plotting inputs and outputs, whenever possible, is a nice way of developing intuition about mathematical concepts.

In a next article, we’ll go into details about various types of ANN’s, and write some further implementation of concepts.

Please login or register to post a comment.