What are these Neural Nets Anyway?

With the introduction of ChatGPT, about two years ago, AI has become all the rage. With it Neural Networks have been something you hear a lot as it is the underlying technology for these systems. However, for those that are not well versed in (big) data or machine learning, this is somewhat of a magical concept.
In this article we’ll have a look at what these networks are, and how they work. Without all the mathematics that are used to bring them to life, we’ll have a look at Neural networks to get a better understanding of what they are.

Where are neural networks used?
Neural networks are by no means new, they have been around since the 1940s (more on that later) since then, they’ve grown in capability and usage.
They are the reason Netflix is able to suggest to you which show you’d like to watch next, how a Tesla is able to understand the road ahead, how social media platforms fill your feed with content that is of interest to you and how apple is able to recognize your face in order to unlock your phone.

As you can see these networks were in your life long before ChatGPT and are controlling a lot of aspects in your daily life. All the more reason to get a basic understanding of what these things are.

The basic concept of a Neural Network

Neural networks are a mathematical (greatly simplified) model of how the brain works. They are, as the name suggests, a network of neurons. Each neuron performs a simple mathematical calculation on the inputs it receives which is then forwarded to the next neuron. The amount of influence a single input has towards to result of this neuron is based on a weight. Changing these weights between neurons determines how much influence they have on each other and in turn, changes the behavior of the network.

Neural networks are, in essence, a simple concept, but when scaling these networks to ever greater sizes they turn out to be to be able to perform fascinatingly complex tasks.

A (very) brief history of Neural Networks

The history of Neural Networks (or AI in general) is a fascinating subject, which deserves way more attention than I’ll give it here, but this is a short summary for some context.

The origin of neural networks stems from the year 1943, when Warren McCulloch and Walter Pitts published a paper “A Logical Calculus of the Ideas Immanent in Nervous Activity” introducing the world to the concept. They created a simple mathematical model based on the current knowledge of the human brain and proved that a network of neurons performing a very simple mathematical operation can be used to perform significantly more complex tasks (the whole is greater than the sum of all parts?).
In the 1980s advancements were made in ‘training’ (i.e. Back-propagation) these models on examples which allowed the networks to learn how to perform a given complex operation. This elevated these models from a theoretical to a practical tool. Leading the way for implementation of OCR (Optical Character Recognition) which allowed computers to read written text in documents.
In the decades following this, various advancements were made in the field, allowing Neural Networks to be applied to ever more complex tasks.

How a neural network works

Neural networks consist of a number of layers, the input layer, which receives the data provided to the network, one or more hidden layers, which perform further processing on the data and finally the output layer, which receives the output from the last hidden layer and outputs the final result of the network.

All these layers consist of a predefined number of neurons that perform a specific calculation (activation function) on their input data and then provide their output to every single neuron in the next layer. It’s good to understand that every neuron will receive multiple inputs but will only output a single value based on those inputs. As stated earlier how much value the neuron will give to a particular input will be based on a weight (0..1).

A practical example of this would be, if we want to have our network analyse an image, we’ll provide the colour value of each pixel in the image to a single neuron in the input layer (so for an image of 512×512 we’ll need 262.144 input neurons, or more likely 786.432 providing a neuron for each colour channel Red/Green/Blue).
Every neuron in the input layer will perform its activation function on that value and forward the result to the next layer. For some neurons in the next layer the weight may be set to 0 which will mean it’s completely ignored, of others I might be 1 which means it has a larger than average influence on the output of that particular neuron.

This process continues through all the layers of the network until finally, given the weights are set in the right way, a neuron in the output layer will output 1, telling us that indeed this was an image of a cat. And the output neuron that signifies dogs will produce 0.

This all sounds kind of magical, as a big collection for very simple things create such an ‘intelligent’ outcome.
If that sounds impossible just remember that the phone in your pocket does everything it does using only the 232 operations It’s CPU is able to, we see the magic of simple operations doing great thing every single day.

As you can image the ‘magic sauce’ of this all is in making sure all the weights are set the right value, and doing this manually might have been feasible for McCulloch & Pitts in 1943, for the networks we use now, counting neurons in the billions, we’ve long passed that stage.

So, let’s have a look at training these models.

Training a Neural network

As we concluded in the previous section manually configuring these networks to perform sensible work is not very feasible so here, we’ll look at how we put the ‘learning’ in machine learning.

When we want to teach a neural network to perform a task it needs to be trained (or fitted) to perform this task, which mean that we let ‘it’ figure out what the weights in the network should be to get a desired outcome for a given input.

This means that we need a significant collection of inputs for which we already know the correct output, this is better known as the training set. So, looking at our previous example you would grab a data set like this, which contains 25,000 images which are labelled to either contain a cat or a dog.

We start by setting up our NN with all its neurons and connections, setting the weights to random values between 0 and 1.

We then grab the first image from our training set and feed it to the NN, once we have an outcome we compare it to the expected value, and use a technique called back propagation to slightly adjust all the weights in the network.

Back propagation is very much the magic sauce of training neural networks. Many flavors and improvements have been thought of, but the basic idea has existed since the 1980s.

Back propagation is an algorithm that can take the error (difference between expected output and the actual output) and calculate (given the weight) how much any input neuron was ‘to blame’ for this error and adjust their weight accordingly. This is then done for every layer in the network, slightly adjusting all the weights to improve the outcome of the network.

This process is then repeated until the accuracy of the network is to our liking.

At this point we can save all the weights, and we now have our very own neural network to recognize cats and dog.

Risks of bias

As we have seen neural networks learn the task, they are designed for by giving them copious amounts of examples. They know nothing more than what we provide them, it is therefore important to make sure that any training set used is cleaned of any mistakes and provides a well-balanced representation of the data.

i.e. if our cats and dog’s dataset only contained cats with green eyes and dogs with brown eyes, you run the risk that this model will identify a cat with brown eyes as a dog.

Given the prevalence of Neural Networks in our daily life, the importance of making sure these models are unbiased cannot be overstated.

A well-known real-world example of this is that historically facial recognition works very well for Caucasians but less so for others, simply because certain other ethnicities where heavily underrepresented in the training set.

Some of these errors in neural networks might be easy to spot, but for other user cases of this might not be as obvious. Which is also why it is generally advised to make clear that output is generated by AI (where possible) and if consequential make sure a human is reviewing its output before any actions are taken.

Conclusion

I’m aware this is a very short write up of a fascinating and huge topic, but I hope it gives some very high over idea of how these magical things called neural network work. I’m looking forward to writing more on the subject in the future, so if this piqued you interest feel free to contact me with any questions or topics I should cover next.