An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms

Li, Lihong; Chu, Wei; Langford, John

Computer Science > Machine Learning

arXiv:1003.5956v1 (cs)

[Submitted on 31 Mar 2010 (this version), latest version 1 Mar 2012 (v2)]

Title:An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms

Authors:Lihong Li, Wei Chu, John Langford

View PDF

Abstract:Offline evaluation of reinforcement learning algorithms based on collected data (state transitions and rewards) has remained a challenging problem. Common practice is to create a simulator based on collected data and then run the algorithm against this simulator. Such an approach involves creating a simulator of the problem at hand, which is often difficult and may introduce bias to the evaluation results. In this paper, we introduce an offline evaluation method for a subclass of reinforcement learning problems known as contextual bandits. This method is completely driven by data, does not require building a simulator, and gives provably unbiased evaluation results. Its effectiveness is also empirically validated using a large-scale news article recommendation dataset collected from Yahoo! Frontpage.

Comments:	13 pages, 5 figures, in submission (under review).
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
Cite as:	arXiv:1003.5956 [cs.LG]
	(or arXiv:1003.5956v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1003.5956

Submission history

From: Lihong Li [view email]
[v1] Wed, 31 Mar 2010 01:20:07 UTC (312 KB)
[v2] Thu, 1 Mar 2012 23:33:07 UTC (318 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2010-03

Change to browse by:

cs
cs.AI
cs.RO
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Lihong Li
Wei Chu
John Langford

export BibTeX citation

Computer Science > Machine Learning

Title:An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:An Unbiased, Data-Driven, Offline Evaluation Method of Contextual Bandit Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators