Back to Glossary
Machine Learning

Categorisation

Categorisation, also known as classification, is the process of assigning predefined labels or categories to data points based on their characteristics. In machine learning, it's a supervised learning task where an algorithm learns to map input features to specific categories using labeled training data.

Explanation

Categorisation algorithms aim to learn a function that accurately predicts the category of new, unseen data instances. The algorithm analyzes the features of training data and identifies patterns or relationships that distinguish different categories. Common algorithms include logistic regression, support vector machines (SVMs), decision trees, random forests, and neural networks. The performance of a categorisation model is evaluated using metrics such as accuracy, precision, recall, F1-score, and AUC. Categorisation plays a crucial role in many applications, including spam detection, image recognition, sentiment analysis, and medical diagnosis. The effectiveness of a categorisation model heavily depends on the quality and representativeness of the training data and the appropriate selection and tuning of the algorithm and its hyperparameters. Imbalanced datasets (where some categories have significantly fewer examples than others) can pose a challenge and often require specialized techniques like oversampling, undersampling, or cost-sensitive learning.

Related Terms