2020 Theses Doctoral

# Data-Driven Quickest Change Detection

The quickest change detection (QCD) problem is to detect abrupt changes in a sensing environment as quickly as possible in real time while limiting the risk of false alarm. Statistical inference about the monitored stochastic process is performed through observations acquired sequentially over time. After each observation, QCD algorithm either stops and declares a change or continues to have a further observation in the next time interval. There is an inherent tradeoff between speed and accuracy in the decision making process. The design goal is to optimally balance the average detection delay and the false alarm rate to have a timely and accurate response to abrupt changes.

The objective of this thesis is to investigate effective and scalable QCD approaches for real-world data streams. The classical QCD framework is model-based, that is, statistical data model is assumed to be known for both the pre- and post-change cases. However, real-world data often exhibit significant challenges for data modeling such as high dimensionality, complex multivariate nature, lack of parametric models, unknown post-change (e.g., attack or anomaly) patterns, and complex temporal correlation. Further, in some cases, data is privacy-sensitive and distributed over a system, and it is not fully available to QCD algorithm. This thesis addresses these challenges and proposes novel data-driven QCD approaches that are robust to data model mismatch and hence widely applicable to a variety of practical settings.

In Chapter 2, online cyber-attack detection in the smart power grid is formulated as a partially observable Markov decision process (POMDP) problem based on the QCD framework. A universal robust online cyber-attack detection algorithm is proposed using the model-free reinforcement learning (RL) for POMDPs. In Chapter 3, online anomaly detection for big data streams is studied where the nominal (i.e., pre-change) and anomalous (i.e., post-change) high-dimensional statistical data models are unknown. A data-driven solution approach is proposed, where firstly a set of useful univariate summary statistics is computed from a nominal dataset in an offline phase and next, online summary statistics are evaluated for a persistent deviation from the nominal statistics.

In Chapter 4, a generic data-driven QCD procedure is proposed, called DeepQCD, that learns the change detection rule directly from the observed raw data via deep recurrent neural networks. With sufficient amount of training data including both pre- and post-change samples, DeepQCD can effectively learn the change detection rule for all complex, high-dimensional, and temporally correlated data streams. Finally, in Chapter 5, online privacy-preserving anomaly detection is studied in a setting where the data is distributed over a network and locally sensitive to each node, and its statistical model is unknown. A data-driven differentially private distributed detection scheme is proposed, which infers network-wide anomalies based on the perturbed and encrypted statistics received from nodes. Furthermore, analytical privacy-security tradeoff in the network-wide anomaly detection problem is investigated.

## Files

- Kurt_columbia_0054D_15905.pdf application/pdf 2.12 MB Download File

## More About This Work

- Academic Units
- Electrical Engineering
- Thesis Advisors
- Wang, Xiaodong
- Degree
- Ph.D., Columbia University
- Published Here
- June 24, 2020