Robust analysis of discounted Markov decision processes with uncertain transition probabilities

Zhen kai Lou; Fu jun Hou; Xu ming Lou

doi:10.1007/s11766-020-3664-1

Robust analysis of discounted Markov decision processes with uncertain transition probabilities

Zhen kai Lou, Fu jun Hou^*, Xu ming Lou

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities. In practice, some transition probabilities may be uncertain. The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities. Our research yields powerful contributions for Markov decision processes (MDPs) with uncertain transition probabilities. We first propose a method for estimating unknown transition probabilities based on maximum likelihood. Since the estimation may be far from accurate, and the highest expected total reward of the MDP may be sensitive to these transition probabilities, we analyze the robustness of an optimal policy and propose an approach for robust analysis. After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers, we formulate a model to obtain the optimal policy. Finally, we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds. Numerical examples are given to show the practicability of our methods.

Original language	English
Pages (from-to)	417-436
Number of pages	20
Journal	Applied Mathematics
Volume	35
Issue number	4
DOIs	https://doi.org/10.1007/s11766-020-3664-1
Publication status	Published - Oct 2020

Keywords

60J10
90C05
90C40
Markov decision processes
robust optimal policy
robustness and sensitivity
uncertain transition probabilities
value interval

Access to Document

10.1007/s11766-020-3664-1

Cite this

Lou, Z. K., Hou, F. J., & Lou, X. M. (2020). Robust analysis of discounted Markov decision processes with uncertain transition probabilities. Applied Mathematics, 35(4), 417-436. https://doi.org/10.1007/s11766-020-3664-1

@article{8c6433125fb641b99f7f9eddbb5bf395,

title = "Robust analysis of discounted Markov decision processes with uncertain transition probabilities",

abstract = "Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities. In practice, some transition probabilities may be uncertain. The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities. Our research yields powerful contributions for Markov decision processes (MDPs) with uncertain transition probabilities. We first propose a method for estimating unknown transition probabilities based on maximum likelihood. Since the estimation may be far from accurate, and the highest expected total reward of the MDP may be sensitive to these transition probabilities, we analyze the robustness of an optimal policy and propose an approach for robust analysis. After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers, we formulate a model to obtain the optimal policy. Finally, we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds. Numerical examples are given to show the practicability of our methods.",

keywords = "60J10, 90C05, 90C40, Markov decision processes, robust optimal policy, robustness and sensitivity, uncertain transition probabilities, value interval",

author = "Lou, {Zhen kai} and Hou, {Fu jun} and Lou, {Xu ming}",

note = "Publisher Copyright: {\textcopyright} 2020, Editorial Committee of Applied Mathematics.",

year = "2020",

month = oct,

doi = "10.1007/s11766-020-3664-1",

language = "English",

volume = "35",

pages = "417--436",

journal = "Applied Mathematics",

issn = "1005-1031",

publisher = "Springer Verlag",

number = "4",

}

TY - JOUR

T1 - Robust analysis of discounted Markov decision processes with uncertain transition probabilities

AU - Lou, Zhen kai

AU - Hou, Fu jun

AU - Lou, Xu ming

PY - 2020/10

Y1 - 2020/10

N2 - Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities. In practice, some transition probabilities may be uncertain. The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities. Our research yields powerful contributions for Markov decision processes (MDPs) with uncertain transition probabilities. We first propose a method for estimating unknown transition probabilities based on maximum likelihood. Since the estimation may be far from accurate, and the highest expected total reward of the MDP may be sensitive to these transition probabilities, we analyze the robustness of an optimal policy and propose an approach for robust analysis. After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers, we formulate a model to obtain the optimal policy. Finally, we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds. Numerical examples are given to show the practicability of our methods.

AB - Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities. In practice, some transition probabilities may be uncertain. The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities. Our research yields powerful contributions for Markov decision processes (MDPs) with uncertain transition probabilities. We first propose a method for estimating unknown transition probabilities based on maximum likelihood. Since the estimation may be far from accurate, and the highest expected total reward of the MDP may be sensitive to these transition probabilities, we analyze the robustness of an optimal policy and propose an approach for robust analysis. After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers, we formulate a model to obtain the optimal policy. Finally, we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds. Numerical examples are given to show the practicability of our methods.

KW - 60J10

KW - 90C05

KW - 90C40

KW - Markov decision processes

KW - robust optimal policy

KW - robustness and sensitivity

KW - uncertain transition probabilities

KW - value interval

UR - http://www.scopus.com/inward/record.url?scp=85098112341&partnerID=8YFLogxK

U2 - 10.1007/s11766-020-3664-1

DO - 10.1007/s11766-020-3664-1

M3 - Article

AN - SCOPUS:85098112341

SN - 1005-1031

VL - 35

SP - 417

EP - 436

JO - Applied Mathematics

JF - Applied Mathematics

IS - 4

ER -

Robust analysis of discounted Markov decision processes with uncertain transition probabilities

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this