IEEE/CAA Journal of Automatica Sinica
Citation: | B. Yang, C. Tang, Y. Liu, G. Wen, and G. Chen, “A linear programming-based reinforcement learning mechanism for incomplete-information games,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 11, pp. 2340–2342, Nov. 2024. doi: 10.1109/JAS.2024.124464 |
[1] |
H. Kebriaei, A. Rahimi-Kian, and M. N. Ahmadabadi, “Model-based and learning-based decision making in incomplete information cournot games: A state estimation approach,” IEEE Trans. Syst. Man Cybern. Syst., vol. 45, no. 4, pp. 713–718, Apr. 2015. doi: 10.1109/TSMC.2014.2373336
|
[2] |
L. Xue, C. Sun, D. Wunsch, Y. Zhou, and F. Yu, “An adaptive strategy via reinforcement learning for the prisoners dilemma game,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 301–310, Jan. 2018. doi: 10.1109/JAS.2017.7510466
|
[3] |
W. Zha, J. Chen, and Z. Peng, “Dynamic multi-team antagonistic games model with incomplete information and its application to multi-UAV,” IEEE/CAA J. Autom. Sinica, vol. 2, no. 1, pp. 74–84, Jan. 2015. doi: 10.1109/JAS.2015.7032908
|
[4] |
H. Wang, T. Huang, X. Liao, H. Abu-Rub, and G. Chen, “Reinforcement learning for constrained energy trading games with incomplete information,” IEEE Trans. Cybern., vol. 47, no. 10, pp. 3404–3416, Oct. 2017. doi: 10.1109/TCYB.2016.2539300
|
[5] |
D. Shen, “Iterative learning control with incomplete information: A survey,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 5, pp. 885–901, Sep. 2018. doi: 10.1109/JAS.2018.7511123
|
[6] |
G. Wen, J. Fu, P. Dai, and J. Zhou, “DTDE: A new cooperative multiagent reinforcement learning framework,” The Innovation, vol. 2, no. 4, p. 100162, Sep. 2021. doi: 10.1016/j.xinn.2021.100162
|
[7] |
J. Tsitsiklis, “Asynchronous stochastic approximation and Q-learning,” Mach. Learn., vol. 16, no. 3, pp. 185–202, Sep. 1994.
|
[8] |
Y. Zhou, J. Li, and J. Zhu, “Posterior sampling for multi-agent reinforcement learning: Solving extensive games with imperfect information,” in Proc. Int. Conf. Learn. Represent., 2020.
|
[9] |
L. Meng, Z. Ge, P. Tian, B. An, and Y. Gao, “An efficient deep reinforcement learning algorithm for solving imperfect information extensive-form games,” in Proc. AAAI Conf. Artif. Intell., Jun. 2023, vol. 37, no. 5, pp. 5823–5831.
|
[10] |
E. Lockhart, M. Lanctot, J. Pérolat, J. Lespiau, D. Morrill, F. Timbers, and K. Tuyls, “Computing approximate equilibria in sequential adversarial games by exploitability descent,” in Proc. Int. Joint Conf. Artif. Intell., 2019, pp. 464–470.
|
[11] |
S. Srinivasan, M. Lanctot, V. Zambaldi, J. Pérolat, K. Tuyls, R. Munos, and M. Bowling, “Actor-critic policy optimization in partially observable multiagent environments,” in Proc. Adv. Neural Inf. Proces. Syst., 2018, pp. 3422–3435.
|
[12] |
M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, and T. Graepel, “A unified game-theoretic approach to multiagent reinforcement learning,” in Proc. Adv. Neural Inf. Proces. Syst., 2017, pp. 4191–4204.
|
[13] |
S. Fang and S. Puthenpura, Linear Optimization and Extensions: Theory and Algorithms. Englewood Cliffs, USA: Prentice Hall, 1993.
|