| Oct 28, 2023 | Biraj sarmah |
The Inner Workings of Neural Network |5|
The key concept here is finding the minimum of a particular function. In simple terms, consider each neuron as linked to all neurons in the preceding layer, with the weights in the weighted sum determining its activation. These weights represent the strength of these connections. The bias, on the other hand, offers an idea of whether a neuron is more likely to be active or inactive. Initially, we set all weights and biases randomly.
Understandably, at this stage, the network’s performance on any given training example is quite poor, as it’s essentially producing random results. For example, if you input an image of the number 3, the output layer appears chaotic. To address this, we introduce a cost function—a means of instructing the computer that its output should have activations close to 0 for most neurons but 1 for the specific neuron corresponding to the correct digit. In mathematical terms, we sum the squares of the differences between the produced output activations and the desired values. This is what we refer to as the cost for a single training example.
Notably, this sum remains small when the network confidently classifies the image correctly, but it increases significantly when the network is uncertain about its decisions. Subsequently, we extend our evaluation to encompass the average cost over the extensive array of tens of thousands of training examples at our disposal. This average cost serves as a metric for quantifying the network’s performance and the extent to which the computer should strive for improvement.
It’s worth keeping in mind that the network itself essentially acts as a function—one that takes in 784 input values (the pixel values) and generates 10 output values. The network’s behavior is parameterized by the vast array of weights and biases. The cost function introduces an additional layer of complexity. It takes as input the approximately 13,000 weights and biases, culminating in a single numeric representation of the inadequacy of these weights and biases. The precise nature of this function’s definition hinges on the network’s behavior across the entirety of the extensive training dataset.
However, merely pointing out the shortcomings of the network isn’t constructive. What’s needed is a strategy for instructing the network on how to adjust these weights and biases to achieve improvement. To simplify this process, instead of grappling with a function involving 13,000 inputs, consider a simpler function with just one input and one output.
The key challenge then becomes how to identify an input value that minimizes this function. For those familiar with calculus, it’s sometimes possible to explicitly calculate the minimum. However, for complex functions like the 13,000-input version associated with our intricate neural network cost function, this is not a practical approach.
If you repeat this process while adjusting your step size proportionally to the slope, it helps prevent overshooting.
It’s important to grasp that the cost function here represents an average across all the training data. When we minimize it, it signifies an enhanced performance across all these samples.
The algorithm that efficiently computes this gradient, effectively the heart of how a neural network learns, is referred to as backpropagation. In our next post, I’ll delve into the intricacies of what precisely transpires with each weight and bias for a given training data point, with the aim of providing an intuitive understanding that transcends the plethora of intricate calculus and formulas.
For the present moment, setting aside the specifics of implementation, the key point to understand is that when we speak of a network learning, we are essentially discussing the minimization of a cost function. One implication of this is that the cost function should exhibit a smooth output, facilitating the process of discovering a local minimum through small downhill steps.
This is the reason artificial neurons feature continuously ranging activations, distinct from the binary active/inactive states of biological neurons. This iterative process of nudging an input of a function by a multiple of the negative gradient is termed gradient descent.
It’s a technique for converging toward a local minimum of a cost function, essentially akin to descending into a valley within a graph. While I’m still illustrating a function with two inputs, as grasping a 13,000-dimensional input space can be quite challenging, there exists a non-spatial way to comprehend this.
Each element of the negative gradient provides two fundamental pieces of information. The sign dictates whether the corresponding element of the input vector should be nudged upward or downward. Equally important, the relative magnitudes of these elements convey which changes bear more significance. Within our network, modifying one weight may have a more pronounced impact on the cost function than altering another weight. Continue Reading …
Readers Also Read This
Artificial Intelligence Has No Reason to Harm Us: Deep Dive Analysis
Unleash AI Text-to-Speech Excellence – Elevate Your Voice, Join Discord and Speak Your Mind with Cutting-Edge Technology!