Back to Glossary
Vision

volutional neural network (CNN)

A convolutional neural network (CNN) is a deep learning architecture primarily used for processing data that has a grid-like topology, such as images. CNNs use convolutional layers to automatically and adaptively learn spatial hierarchies of features from input data, making them highly effective for tasks like image classification, object detection, and image segmentation.

Explanation

CNNs are inspired by the organization of the visual cortex in animals. The core building block is the convolutional layer, which applies a set of learnable filters (kernels) across the input data. These filters detect specific patterns or features. The output of a convolutional layer, called a feature map, represents the locations and strength of the detected features. Multiple convolutional layers are stacked together, allowing the network to learn increasingly complex and abstract features. Pooling layers are often used to reduce the spatial dimensions of the feature maps, decreasing computational cost and increasing robustness to variations in the input. Key components include: Convolutional Layers (feature extraction), Pooling Layers (downsampling), Activation Functions (introducing non-linearity, e.g., ReLU), and Fully Connected Layers (classification). CNNs excel at tasks where spatial relationships in the data are important. Backpropagation is used to train the network, optimizing the filter weights to minimize a loss function. They are used across many domains, including computer vision, natural language processing (for tasks like text classification), and audio processing.

Related Terms