| Oct 28, 2023 | Biraj sarmah |
The Inner Workings of Neural Network |6|
Certain connections hold greater relevance within our training data. One approach to interpreting this gradient vector for our immensely complex cost function is that it encodes the relative importance of each weight and bias, signifying which alterations have the most significant impact.
This offers an alternative perspective on direction. For a simpler example, envision a function with two variables as input. If, for instance, the gradient at a specific point yields a value of 3, it implies that when you stand at that input, movement in this direction generates the most rapid increase in the function’s value.
Now, when you graph the function above the plane of input points, that vector represents the direct uphill direction. Another way to interpret this is that changes to the first variable carry three times the significance of changes to the second variable. Particularly in the vicinity of the relevant input, nudging the x-value yields more pronounced effects.
In summary, let’s revisit where we stand at this point. The network itself acts as a function with 784 inputs and 10 outputs, delineated in terms of these weighted sums. The cost function introduces an additional layer of complexity. It takes the approximately 13,000 weights and biases as inputs and provides a single quantification of performance based on the training examples.
The gradient of the cost function adds yet another layer of complexity. It conveys which adjustments to the weights and biases produce the most rapid shifts in the cost function’s value, effectively highlighting which changes in which weights carry the most weight.
Now, when you initiate the network with random weights and biases and iteratively adjust them through the gradient descent process, how well does it perform on unseen images? For the network structure I’ve detailed, featuring two hidden layers of 16 neurons each, predominantly chosen for aesthetic reasons, its performance is commendable.
It accurately classifies approximately 96 percent of new images. In fact, when you consider some of the images it stumbles on, you might find yourself inclined to cut it some slack.
Should you choose to experiment with the hidden layer structure and make a few adjustments, you can elevate this accuracy to 98 percent, which is certainly noteworthy. While it might not be the absolute best performance attainable, considering the daunting nature of the initial task and the fact that the network wasn’t explicitly instructed on what patterns to discern, the achievement remains rather remarkable.
Initially, we had aspired that the second layer would capture minor edges, the third layer would amalgamate these edges to identify loops and extended lines, and these patterns would culminate in the recognition of digits. However, for this particular network, such aspirations don’t align with reality. In one another post, we explored how the weights associated with the connections from the first layer’s neurons to the second layer’s neurons can be visualized as pixel patterns. However, when we perform this visualization for the weights linked to transitions from the first layer to the subsequent, they do not align with isolated edges. Instead, they exhibit patterns that could be perceived as random, albeit with some vague structures within.
It appears that in the immensely vast 13,000-dimensional space of potential weights and biases, our network has converged towards a local minimum that, despite proficiently classifying most images, doesn’t precisely capture the patterns we originally hoped for. To emphasize this point, observe what occurs when you input a random image.
If the system were truly intelligent, one might expect it to exhibit uncertainty, possibly not strongly activating any of those ten output neurons or activating them evenly. However, it confidently provides an absurd answer, as if it’s just as sure that this random noise represents a 5 as it is that an actual image of a 5 is indeed a 5.
In simpler terms, while this network can proficiently recognize digits, it has no clue how to draw them. Much of this limitation stems from the highly constrained training setup. Put yourself in the network’s position for a moment. As far as it knows, the entire universe comprises clearly defined, motionless digits centered in a small grid, and its cost function never gave it any reason to be anything but completely certain in its judgments. Continue Reading …
Readers Also Read This
Artificial Intelligence Has No Reason to Harm Us: Deep Dive Analysis
Unleash AI Text-to-Speech Excellence – Elevate Your Voice, Join Discord and Speak Your Mind with Cutting-Edge Technology!