| Oct 27, 2023 | Biraj sarmah |
The Inner Workings of Neural Network |4|
Let’s decode the intricate structure of a neural network. To begin, we’ll provide a quick recap, ensuring our foundation is rock solid. Following this, we have two primary objectives for this article. Our initial aim is to acquaint you with the concept of gradient descent, a fundamental principle not only for the learning process of neural networks but also for various other machine learning methodologies.
Afterward, we will delve deeper into the performance of this specific network and unveil the enigmatic inner layers of neurons. As a reminder, our focal point here is the classic challenge of recognizing handwritten digits, often considered the entry point into the world of neural networks. These digits are displayed on a 28 by 28 pixel grid, each pixel featuring a grayscale value between 0 and 1.
These grayscale values serve as the determining factors for the activation levels of the 784 neurons residing in the input layer of the network. Subsequently, the activation of each neuron in the layers that follow is contingent upon a weighted sum of all activations in the preceding layer, augmented by a unique variable referred to as bias. This summation is then subjected to various functions, such as the sigmoid squishification or a ReLU, as elucidated.
In totality, with the somewhat arbitrary selection of two concealed layers, each housing 16 neurons, the network boasts approximately 13,000 weights and biases that can be meticulously adjusted. These values hold the key to defining the precise behavior of the network. When we write of this network classifying a given digit, we are alluding to the brightest neuron among the ten in the final layer, corresponding to that specific digit.
Recall our initial motivation for the layered structure; the second layer is poised to detect edges, the third layer specializes in recognizing patterns like loops and lines, and the final layer is tasked with assembling these patterns to accurately identify digits. This is where we embark on a journey to comprehend how the network learns.
Our aspiration is to develop an algorithm capable of learning from a substantial volume of training data. This dataset takes the form of a diverse collection of images featuring handwritten digits, each thoughtfully labeled with their respective identities. The network, as its primary mission, proceeds to fine-tune its 13,000 weights and biases, all with the aim of enhancing its performance when confronted with the training data.
Our optimism stems from the layered structure’s potential to facilitate the network in extending its learning to images that transcend the confines of the training dataset. The litmus test comes when we present the trained network with new, unlabeled data and assess its accuracy in classifying these unfamiliar images.
Thankfully, we have the MNIST database at our disposal—a valuable resource that provides us with thousands of handwritten digit images, each meticulously labeled. Describing a machine as a “learner” may evoke notions of science fiction; however, as we delve into its inner workings, the process unfolds as a captivating exercise in calculus rather than a far-fetched concept.
In essence, the heart of the matter boils down to a quest to find the minimum of a specific function. Conceptually, each neuron can be envisioned as intricately connected to all the neurons in the preceding layer, with the weights in the weighted sum defining its activation. These weights can be thought of as the strengths of these connections, while the bias offers insight into whether the neuron is prone to being active or inactive. To kick things off, we take the initial step of initializing all these weights and biases completely at random.
Predictably, at this stage, the network’s performance on any given training example is rather dismal since it’s essentially producing random results. For instance, when you input an image of the number 3, the output layer resembles nothing more than chaos. To rectify this, you establish a cost function—a mechanism for informing the computer that its current output should feature activations close to 0 for most neurons, but 1 for the particular neuron corresponding to the correct digit. In more precise mathematical terms, you calculate the sum of the squares of the differences between the produced output activations and the desired values. This value is what we refer to as the cost of a single training example.
Notably, this sum remains small when the network accurately classifies the image with confidence, but it becomes considerably larger when the network appears uncertain about its decisions. Subsequently, you extend your evaluation to encompass the average cost over the extensive array of tens of thousands of training examples at your disposal. This average cost serves as a metric for quantifying the network’s performance and the extent to which the computer should strive for improvement.
Now, it’s worth remembering that the network itself is essentially a function—a function that accepts 784 input values (the pixel values) and generates 10 output values. Its characteristics and behavior are parameterized by the vast array of weights and biases that we’ve introduced. The cost function, on the other hand, introduces an additional layer of complexity. It takes as its input those approximately 13,000 weights and biases, culminating in a single numeric representation of the inadequacy of those weights and biases. The precise nature of this function’s definition hinges on the network’s behavior across the entirety of the extensive training dataset.
However, merely pointing out the shortcomings of the network isn’t constructive. What’s needed is a strategy for instructing the network on how to adjust those weights and biases to achieve improvement. To simplify this process, instead of grappling with a function involving 13,000 inputs, consider a simpler function with just one input and one output.
The key challenge then becomes how to identify an input value that minimizes this function. For those familiar with calculus, it’s sometimes possible to explicitly calculate the minimum. However, for complex functions like the 13,000-input version associated with our intricate neural network cost function, this is not a practical approach.
A more adaptable strategy involves starting at an initial input value and determining the direction in which to adjust it to reduce the output. In essence, it involves understanding the slope of the function at the current input value, which provides insights into how to move towards the minimum. Continue reading …
Readers Also Read This
Artificial Intelligence Has No Reason to Harm Us: Deep Dive Analysis
Unleash AI Text-to-Speech Excellence – Elevate Your Voice, Join Discord and Speak Your Mind with Cutting-Edge Technology!