Reinforcement learning falls in between supervised and unsupervised learning.
In this case, the algorithm is fed an unlabelled set of data, chooses an action for each data point, and receives feedback (perhaps from a human) that helps the algorithm learn.
- Unlabelled Data: Unlike supervised learning, reinforcement learning does not require labeled input/output pairs. It works with unlabelled data or an environment where the learning agent operates.
- Actions and Feedback Loop: The learning agent in reinforcement learning makes decisions or takes actions within its environment. These actions are not based on pre-defined labels but are chosen based on the agent’s current policy, which is developed through learning.
- Feedback – Rewards and Punishments: After taking an action, the agent receives feedback in the form of rewards or punishments (positive or negative reinforcement). This feedback is crucial as it guides the learning process.
- Learning from Consequences: The agent learns from the consequences of its actions rather than from being told explicitly what to do. Over time, the agent identifies which actions tend to yield the highest rewards.
- Goal-Oriented: The process is typically goal-oriented, aiming to maximize the cumulative reward. The agent learns to make sequences of decisions that lead to the best possible outcome.
In the context of AML/CFT, reinforcement learning can be used to develop systems that adapt to changing financial crime patterns. For instance, an AML system powered by reinforcement learning can continuously adjust its transaction monitoring strategy based on the feedback from identified cases of fraud or money laundering, thereby becoming more effective over time.