Reinforcement Learning Examples

Large Rewards Accelerate Learning Speed by Extending Brain Signals

Summary: A paradigm-shifting study has upended a decades-long neurological assumption that learning speed depends entirely on repetition and experience rather than the size of a reward. The research ...

29d

Alibaba's Metis agent cuts redundant AI tool calls from 98% to 2% — and gets more accurate doing it

Alibaba's HDPO framework trains AI agents to skip unnecessary tool calls, cutting redundant invocations from 98% to 2% while boosting reasoning accuracy.

Scientific Research Publishing

Ribba, B. (2023) Reinforcement Learning as an Innovative Model-Based Approach: Examples from Precision Dosing, Digital Health and Computational Psychiatry. Frontiers in ...

ABSTRACT: Bipolar disorder (BD) is closely intertwined with abnormalities in sleep and circadian regulation, yet current clinical management typically applies heuristic rules rather than optimizing ...

Microsoft

Experiential Reinforcement Learning

Reinforcement Learning is at the core of building and improving frontier AI models and products. Yet most state-of-the-art RL methods learn primarily from outcomes: a scalar reward signal that says ...

IEEE

Reinforcement Learning for Robust Dynamic Event-Driven Constrained Control

Abstract: We consider a robust dynamic event-driven control (EDC) problem of nonlinear systems having both unmatched perturbations and unknown styles of constraints. Specifically, the constraints ...

acm.org

Rediscovering Reinforcement Learning

Reinforcement learning (RL) is machine learning (ML) in which the learning system adjusts its behavior to maximize the amount of reward and minimize the amount of punishment it receives over time ...

Scientific Research Publishing

Ribba, B. (2023) Reinforcement Learning as an Innovative Model-Based Approach: Examples from Precision Dosing, Digital Health and Computational Psychiatry. Frontiers in ...

ABSTRACT: Depression treatment often involves a complex and lengthy trial-and-error process, where clinicians sequentially prescribe medications to identify the most ...

GitHub

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Our training pipeline is adapted from verl and rllm(DeepScaleR). The installation commands that we verified as viable are as follows: conda create -y -n rlvr_train ...

IEEE

GAME-RL: Generating Adversarial Malware Examples against API Call Based Detection via Reinforcement Learning

Abstract: The adversarial example presents new security threats to trustworthy detection systems. In the context of evading dynamic detection based on API call sequences, a practical approach involves ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results