Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Li, Lihong; Chu, Wei; Langford, John; Wang, Xuanhui

doi:10.1145/1935826.1935878

Computer Science > Machine Learning

arXiv:1003.5956 (cs)

[Submitted on 31 Mar 2010 (v1), last revised 1 Mar 2012 (this version, v2)]

Title:Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Authors:Lihong Li, Wei Chu, John Langford, Xuanhui Wang

View PDF

Abstract:Contextual bandit algorithms have become popular for online recommendation systems such as Digg, Yahoo! Buzz, and news recommendation in general. \emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature. Common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating simulator itself is often difficult and modeling bias is usually unavoidably introduced. In this paper, we introduce a \emph{replay} methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method.

Comments:	10 pages, 7 figures, revised from the published version at the WSDM 2011 conference
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
ACM classes:	H.3.5; I.2.6
Cite as:	arXiv:1003.5956 [cs.LG]
	(or arXiv:1003.5956v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1003.5956
Related DOI:	https://doi.org/10.1145/1935826.1935878

Submission history

From: Lihong Li [view email]
[v1] Wed, 31 Mar 2010 01:20:07 UTC (312 KB)
[v2] Thu, 1 Mar 2012 23:33:07 UTC (318 KB)

Computer Science > Machine Learning

Title:Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators