Back to Glossary
Machine Learning

Vector quantisation (VQ)

Vector quantization (VQ) is a lossy data compression technique used to reduce the dimensionality of data by representing it with a limited set of representative vectors (codewords) from a codebook. It works by dividing a high-dimensional vector space into a number of regions and then approximating all vectors within a region by its centroid (codeword).

Explanation

Vector quantization operates by first creating a 'codebook,' which is a collection of representative vectors learned from a training dataset. This learning process typically involves algorithms like k-means clustering or the Linde-Buzo-Gray (LBG) algorithm. During encoding, an input vector is compared to each codeword in the codebook, and the index of the closest codeword is selected as the encoded representation. The 'closeness' is typically measured using a distance metric like Euclidean distance or cosine similarity. Decoding involves simply retrieving the codeword corresponding to the encoded index from the codebook. VQ is a lossy compression method because the original vector is replaced by an approximation. The amount of compression and distortion depends on the size of the codebook; a smaller codebook results in higher compression but also greater distortion. VQ is used in various applications, including image and audio compression, speech recognition, and data clustering. In machine learning, vector quantization can be used for tasks such as feature extraction and dimensionality reduction. It is also related to clustering techniques, as the codebook can be viewed as a set of cluster centers. It can be useful in pre-processing data for more complex machine learning models, as well as vector retrieval.

Related Terms