Conference item
An alternative to variance: Gini deviation for risk-averse policy gradient
- Abstract:
- Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 6.7MB, Terms of use)
-
- Publisher copy:
- 10.52202/075280-2662
Authors
+ Chinese University of Hong Kong
More from this funder
- Funder identifier:
- https://ror.org/00t33hh48
- Grant:
- UDF01002911
- Publisher:
- Curran Associates
- Host title:
- Advances in Neural Information Processing Systems 36
- Volume:
- 36
- Pages:
- 60922-60946
- Publication date:
- 2024-07-01
- Event title:
- 37th Conference on Neural Information Processing Systems (NeurIPS 2023)
- Event location:
- New Orleans, Louisiana, USA
- Event website:
- https://neurips.cc/Conferences/2023
- Event start date:
- 2023-12-10
- Event end date:
- 2023-12-16
- DOI:
- ISSN:
-
1049-5258
- EISBN:
- 9781713899921
- Language:
-
English
- Pubs id:
-
1994747
- Local pid:
-
pubs:1994747
- Deposit date:
-
2026-06-16
- ARK identifier:
Terms of use
- Copyright holder:
- Luo et al and NeurIPS
- Copyright date:
- 2023
- Rights statement:
- © (2023) by individual authors and Neural Information Processing Systems Foundation Inc. All rights reserved.
- Notes:
- This is the accepted manuscript version of the article. The final version is available online from Curran Associates at https://dx.doi.org/10.52202/075280-2662
If you are the owner of this record, you can report an update to it here: Report update to this record