Knowledge

AI glossary: 40 definitions to avoid getting lost in the world of artificial intelligence

Written by

Nanobaly

Published on

2024-10-24

Reading time

min

Artificial intelligence (AI) has become an essential pillar of modern technologies, impacting fields as diverse as health, finance, or even education.

‍

However, understanding the subtleties of AI can be complex, especially due to the technical jargon often associated with this discipline.

‍

💡 This glossary offers a compilation of 40 key terms, aimed at clarifying the essential concepts of AI and facilitating their understanding for professionals and novices in the field.

‍

Conversational agents (Chatbots)

Chatbots are computer programs that use artificial intelligence to simulate a conversation with users.

They can automatically answer questions, provide information, or complete simple tasks by interacting via text or voice, which are often used on websites and applications.

‍

Algorithm

An algorithm is a series of specific instructions or steps that a computer program follows to solve a problem or perform a specific task.

In AI, algorithms allow machines to make decisions, learn, or process data automatically and efficiently.

‍

Data annotation

Data annotation consists of adding specific labels or descriptions to raw data (images, text, videos, etc.) to make them understandable by AI algorithms.

This allows machine learning models to be trained to recognize objects, actions, or concepts in this data.

‍

Machine learning (Machine Learning)

Machine learning is a branch of artificial intelligence where machines learn from data without being explicitly programmed.

They identify patterns, make predictions, and improve their performance over time using algorithms, such as in image recognition or machine translation.

‍

Multi-task learning (Multi-task Learning)

Multi-task learning is a method where an artificial intelligence model is trained simultaneously on several related tasks.

This allows the model to learn more effectively by sharing knowledge across tasks, thus improving its overall performance across the range of problems to be solved.

‍

Reinforcement learning

Reinforcement learning is an AI technique where an agent learns to make decisions by interacting with their environment.

He receives rewards or punishments based on his actions, and adjusts his behavior to maximize long-term rewards, such as in video games or robotics.

‍

Supervised learning

Supervised learning is an AI method where a model is trained based on labeled examples.

Each training data is associated with a correct answer, allowing the model to learn to predict similar results for new data not seen, such as recognizing objects in images or classifying emails.

‍

Unsupervised learning

Unsupervised learning is an AI method where a model is trained on data without predefined labels or responses.

It must discover hidden patterns or structures by itself, such as grouping similar objects (clustering) or detecting anomalies, without direct human supervision.

‍

Algorithmic bias

Algorithmic bias occurs when an algorithm makes unfair or inequitable decisions due to biases in the data used to train it.

This can lead to discriminatory outcomes or inequalities, affecting specific groups of people or situations, such as in recruitment or facial recognition.

‍

Big Data

Big Data refers to large, complex data sets that are often too large or varied to be processed using traditional methods.

This data comes from a variety of sources (social networks, sensors, etc.) and requires advanced techniques, such as AI and machine learning, to analyze it and extract useful information.

‍

Classification

Classification is a machine learning technique where a model is trained to assign predefined categories or labels to new data.

For example, categorizing emails as “spam” or “non-spam” or recognizing objects in images, such as cats or dogs.

‍

Clustering (Clustering)

Clustering is an unsupervised learning method that consists in grouping similar data into sets called “clusters.”

Contrary to classification, there are no predefined labels. The model discovers similarities in the data to create these groups, which are used for market analysis or customer segmentation, for example.

‍

Cross Entropy Loss

Cross entropy loss is a loss function used to assess the performance of a classification model. It measures the difference between the model's predictions and the true labels.

The more incorrect the prediction, the greater the loss. Its aim is to minimize this difference in order to improve predictions.

‍

Cross-validation

Cross-validation is a technique for evaluating machine learning models. It consists in dividing a data set into several subsets (or “folds”).

The model is trained on some subsets and tested on others. This makes it possible to estimate the performance of the model more reliably by reducing the over-apprenticeship (or overfitting).

‍

ROC and AUC curve

The ROC (Receiver Operating Characteristic) curve evaluates the performance of a classification model by plotting the rate of true positives against the rate of false positives.

The AUC (Area Under the Curve) measures the area under this curve. The closer the AUC is to 1, the better the ability of the model to distinguish between classes.

‍

Dataset

A data set (dataset) is an organized data collection used to train, test, or validate artificial intelligence models.

It may contain text, images, videos, or other types of information, usually labeled, to allow algorithms to Machine Learning to recognize patterns and to make predictions.

‍

How about we help you create datasets for your AI models?

🚀 Our team of Data Labelers and Data Trainers can help you build large, high-quality datasets! Feel free to contact us.

‍

Model training (Model Training)

Model training is the process of using a set of data to teach an AI or machine learning model to perform a specific task, such as classification or prediction.

The model adjusts its parameters based on the examples provided, in order to improve its accuracy on new data.

‍

Feature Engineering

Feature engineering is the process of selecting, transforming or creating new characteristics (or “features”) from raw data, in order to improve the performance of a machine learning model.

These characteristics make it possible to better represent the data and make it easier for the model to identify patterns or make predictions.

‍

Loss function (Loss Function)

The loss function is a tool used in machine learning to measure the difference between model predictions and real values. It assesses the accuracy of the model.

The lower the loss, the closer the model's predictions are to the expected results. The model learns by minimizing this loss.

‍

False alarm (False Positive)

A false alarm (or false positive) occurs when a model incorrectly predicts the presence of a condition or class when it is absent.

For example, a spam detection system would classify legitimate email as spam. This is a common error in classification models.

‍

Natural language generation (NLG)

Natural language generation (NLG) is a sub-field of artificial intelligence that consists in automatically producing understandable text or speech in human language.

It allows a machine to transform raw data into natural sentences or paragraphs, such as in automated summaries or virtual assistants.

‍

Hyperparameters

Hyperparameters are parameters that are defined before an artificial intelligence model is trained and that influence its learning.

Unlike the parameters learned by the model, hyperparameters, such as the learning rate or the size of the neural layers, are fixed manually and adjusted to optimize the performance of the model.

‍

Generative artificial intelligence

Generative artificial intelligence is a branch of AI that creates new content (images, texts, music, etc.) from models trained on existing data.

Using algorithms such as GaNS (Generative Adversary Networks), it makes it possible to generate original works by imitating the patterns found in the training data.

‍

Predictive model

A predictive model is an artificial intelligence algorithm designed to anticipate future results based on historical data.

It analyzes past trends to make predictions about new data, used in various fields such as finance, health, or marketing to predict behaviors or events.

‍

Gradient optimization

The optimization of gradient is a technique used to adjust the parameters of an AI model to minimize the loss function.

It consists in calculating the slope (gradient) of the loss function and in modifying the parameters in the direction that reduces this slope, thus improving the performance of the model.

‍

Accuracy (Accuracy)

Precision (Accuracy) is a measure of the performance of a classification model. It represents the percentage of correct predictions among all the predictions made.

It is the ratio between accurate predictions (true positives and true negatives) and the total number of predictions. The higher the accuracy, the better the model.

‍

Reminder (Recall)

The reminder (Recall) is a measure of the performance of a classification model. It indicates the ability of the model to correctly identify all positive occurrences of a class.

It is the ratio between the real positives and the total of the really positive elements. High recall means few false negatives.

‍

Image recognition

Image recognition is an artificial intelligence technique where a model analyzes images to identify objects, people, places, or actions.

Used in fields such as safety, health or the automotive industry, it allows machines to visually “see” and understand the content of an image for classification or detection purposes.

‍

Voice recognition

La speech recognition is an artificial intelligence technology that makes it possible to convert speech into text. It analyzes the sounds made by a human voice, identifies the words spoken and transcribes them.

Used in voice assistants, mobile applications, or voice control systems, it facilitates human-machine interactions.

‍

Regression

Regression is a machine learning technique used to predict continuous values from data.

Unlike classification, which assigns categories, regression estimates numerical values, such as the price of a house or future sales. It establishes relationships between input variables and output to make forecasts.

‍

Artificial neural network

An artificial neural network is an artificial intelligence model inspired by the functioning of the human brain. It is composed of interconnected “neurons”, organized in layers, that process information.

Used for complex tasks such as image recognition or language processing, it learns by adjusting the connections between neurons to improve its performance.

‍

Generative adversary networks (GaNS)

GaNS (Generative Adversary Networks) are an artificial intelligence architecture composed of two networks: a generator that creates data and a discriminator that assesses their authenticity.

The two networks are competing to improve each other's performances. GaNs are used to generate realistic images, videos, and other content.

‍

Deep neural networks (Deep Learning)

Neural networks deep learning, or deep learning, are AI models composed of multiple layers of interconnected neurons.

Each layer progressively extracts complex characteristics from the raw data, making it possible to solve complex problems such as image recognition, natural language processing or machine translation.

‍

Underlearning (Underfitting)

Underlearning (Underfitting) occurs when an AI model is too simple to capture underlying patterns in the data.

The result is poor performance on both training data and new data. The model does not learn enough and makes incorrect predictions.

‍

Over-learning (Overfitting)

Over-learning (Overfitting) occurs when an AI model is too complex and adapts too precisely to training data, even capturing noise or anomalies.

Although it performs well on this data, it fails to generalize to new data, making it less reliable for future predictions.

‍

Tokenization

Tokenization is a natural language processing process that involves dividing text into smaller units called “tokens” (words, phrases, or characters).

Each token represents a distinct unit that the AI can process. This step is critical to allow models to analyze and understand the text.

‍

Natural Language Processing (NLP)

Natural language processing (NLP) is a field of artificial intelligence that allows machines to understand, analyze, and generate human language.

It is used in applications like voice assistants, machine translation, or text analysis, allowing computers to interact with language in a natural and fluid way.

‍

Transformers

Transformers are a deep learning model architecture used primarily in natural language processing (NLP).

They capture relationships between different elements of a sequence (words, sentences) in parallel, rather than sequentially like traditional models. Transformers are the basis for successful models like GPT and BERT.

‍

Model tuning

Model tuning consists in adjusting the hyperparameters of an artificial intelligence model to optimize its performance.

This process involves testing different combinations of hyperparameters (such as learning rate or layer depth) to find the ones that offer the best results on a given data set.

‍

Computer Vision

Computer vision is a branch of artificial intelligence that allows machines to understand and interpret images and videos.

By visually analyzing data, computer vision systems can recognize objects, detect faces, analyze movements, or even automate tasks such as quality inspection or autonomous driving.

‍

😊 We hope you found this glossary useful in demystifying some of the key concepts in artificial intelligence. If you want to know more about AI, its applications, or how creating high-quality datasets can contribute to the success of your projects, don't hesitate to contact Innovatiana. Our team of experts is at your disposal to support you in all your initiatives related to AI and data management.

5 essential techniques to optimize the recognition of named entities in AI

How to evaluate annotated datasets to ensure reliability of AI models?

Evaluating data annotators is critical to ensuring the accuracy and consistency of AI models Explore key methods

Deciphering intent classification in AI: a revolution in user understanding

Intent Classification makes it possible to detect the needs behind text queries for an improved user experience