** | Oct 27, 2023 | Biraj sarmah |**

**The Inner Workings of Neural Network |3|**

The weights tell you what pixel pattern this neuron in the second layer is picking up on. And the bias tells you how high the weighted sum needs to be before the neuron starts getting meaningfully active. And that is just one neuron. Every other neuron in this layer is gonna be connected to all 784 pixel neurons from the first layer.

And each one of those 784 connections has its own weight associated with it. Also, each one has some bias, some other number that you add on to the weighted sum before squishing it with the sigmoid. And that’s a lot to think about! With this hidden layer of 16 neurons, that’s a total of 784 times 16 weights along with 16 biases.

And all of that is just the connections from the first layer to the second. The connections between the other layers also have a bunch of weights and biases associated with them. All said and done, this network has almost exactly 13, 000 total weights and biases. 13, 000 knobs and dials that can be tweaked and turned to make this network behave in different ways.

So when we talk about learning, what that’s referring to is getting the computer to find a valid setting for all of these many, many numbers so that it’ll actually solve the problem at hand. One thought experiment that is at once fun and kind of horrifying is to imagine sitting down and setting all of these weights and biases by hand, purposefully tweaking the numbers so that the second layer picks up on edges, the third layer picks up on patterns, et cetera.

I personally find this satisfying rather than just treating the network as a total black box. Because, when the network doesn’t perform the way you anticipate, if you’ve built up a little bit of a relationship with what those weights and biases actually mean, you have a starting place for experimenting with how to change the structure to improve.

Or, when the network does work, but not for the reasons you might expect, Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions. By the way, the actual function here is a little cumbersome to write down, don’t you think?

So, let me show you a more notationally compact way that these connections are represented. This is how you’d see it if you choose to read up more about neural networks. Organize all of the activations from one layer into a column as a vector. Then organize all of the weights as a matrix, where each row of that matrix corresponds to the connections between one layer and a particular neuron in the next layer.

What that means is that taking the weighted sum of the activations in the first layer, according to these weights, corresponds to one of the terms in the matrix vector product of everything we have on the left here.

By the way, so much of machine learning just comes down to having a good grasp of linear algebra, so for any of you who want a nice visual understanding for matrices and what matrix vector multiplication means, take a look at the series I did on linear algebra, especially chapter 3. Back to our expression, instead of talking about adding the bias to each one of these values independently, we represent it by organizing all those biases into a vector.

And adding the entire vector to the previous matrix vector product. Then, as a final step, I’ll wrap a sigmoid around the outside here. And what that’s supposed to represent is that you’re going to apply the sigmoid function to each specific component of the resulting vector inside. So, once you write down this weight matrix and these vectors as their own symbols, You can communicate the full transition of activations from one layer to the next in an extremely tight and neat little expression.

And this makes the relevant code both a lot simpler and a lot faster, since many libraries optimize the heck out of matrix multiplication. Remember how earlier I said these neurons are simply things that hold numbers? Well, of course, the specific numbers that they hold depends on the image you feed in.

So it’s actually more accurate to think of each neuron as a function, one that takes in the outputs of all the neurons in the previous layer and spits out a number between 0 and 1. Really, the entire network is just a function, one that takes in 784 numbers as an input and spits out 10 numbers as an output.

It’s an absurdly complicated function, one that involves 13, 000 parameters in the forms of these weights and biases that pick up on certain patterns, and which involves iterating many matrix vector products and the sigmoid squishification function, but it’s just a function nonetheless. And in a way, it’s kind of reassuring that it looks complicated. Continue reading …

**Readers Also Read This**

**Artificial Intelligence Has No Reason to Harm Us: Deep Dive Analysis**

#### Team

## Join Discord

**Unleash AI Text-to-Speech Excellence – Elevate Your Voice, Join Discord and Speak Your Mind with Cutting-Edge Technology!**