一楼一凤

Stable Subsampling for Streaming Data under Data Shift

活动时间:2026-04-19 09:30

活动地点:2号学院楼2432会议室

主讲人:周永道

主讲人中文简介:

周永道,男,南开大学统计与数据科学学院教授、博导,统计学系主任,入选国家高水平人才青年项目、天津市创新类领军人才、南开大学百青。研究方向为试验设计和大数据分析。主持过 4 项国家自然科学基金、1 项国家自然科学基金重点项目课题、1 项天津市自然科学基金重点项目和其它 10 余项纵横向项目。曾访问 UCLA 等 5 所境外高校。在统计学和机器学习顶刊顶会JRSSB、JASA、Biometrika、JMLR、TKDE、NeurIPS、ICLR等发表学术论文80余篇;合作出版了 9 部中英文专著和教材。曾获全国统计科学研究优秀成果奖一等奖、全国统计科学技术进步奖三等奖、天津市教学成果奖特等奖、华为火花奖。现为天津市现场统计研究会理事长,中国数学会理事、均匀设计分会副理事长,中国现场统计研究会教育统计与管理分会和多元分析应用专业委员会副理事长,Scientific Reports、《应用概率统计》和《四川大学学报(自然科学版)》编委。

活动内容摘要:

Driven by the urgent need for online decision making in continuous monitoring systems, learning from large-scale streaming data has become a fundamental task in modern statistics and machine learning. However, these streams commonly exhibit non-stationary data shifts that are often overlooked by existing methods, leading to both computational inefficiencies and performance degradation. Consequently, maintaining a representative subsample is essential to adapt to evolving distributions while efficiently reducing data size. In this paper, we propose an online model-free Stable Streaming Data Subsampling (SSDS) algorithm designed to continuously maintain a fixed-size subset for nonparametric model prediction. SSDS employs a fast iterative replacement scheme guided by a normalized weighted improvement metric that integrates temporal recency with spatial uniformity. Theoretically, we establish that by simultaneously controlling these spatio-temporal properties, the estimator derived from the selected subsample attains asymptotically minimax optimality regarding the integrated mean squared prediction error for Gaussian process models. Empirical evaluations on synthetic and real-world datasets demonstrate that SSDS significantly outperforms existing strategies, validating it as a computationally efficient framework that ensures stable predictions in non-stationary environments.

主持人:戚良玮