EPyT-Flow: A Toolkit for Generating Water Distribution Network Data.
https://doi.org/10.21105/joss.07104Supervised online learning relies on the assumption that ground truth information is available for model updates at each time step. As this is not realistic in every setting, alternatives such as active online learning, or online learning with verification latency have been proposed. In this work, we assume that no label information is available after intitial training. We argue that provided we can characterize the expected concept drift as incremental drift, we can rely on a self-labeling strategy to keep updated models. We derive a k-NN-based self-labeling online learner implementing the presented self-supervised scheme and experimentally show that this is an option for learning from incrementally drifting data streams in the absence of label information.
https://doi.org/10.14428/esann/2024.ES2024-49Prototype-based methods constitute a robust and transparent family of machine-learning models. To increase robustness in real-world applications, they are frequently coupled with reject options. While the state-of-the-art method, relative similarity, couples the rejection of samples with high aleatoric and epistemic uncertainty, the technique lacks transparency, i.e., an explanation of why a sample has been rejected. In this work, we analyze the relative similarity analytically and derive an explanation scheme for reject options in prototype-based classification.
https://doi.org/10.14428/esann/2024.ES2024-156Feature selection is one of the most relevant preprocessing and analysis techniques in machine learning, allowing for increases in model performance and knowledge discovery. In online setups, both can be affected by concept drift, i.e., changes of the underlying distribution. Recently, an adaption of classical feature relevance approaches to drift detection was introduced. While the method increases detection performance significantly, there is only little discussion on the explanatory aspects. In this work, we focus on understanding the structure of the ongoing drift by transferring the concept of strongly and weakly relevant features to it. We empirically evaluate our methodology using graphical models.
https://doi.org/10.14428/esann/2024.ES2024-89Especially if artificial intelligence (AI)-supported decisions affect the society, the fairness of such AI-based methodologies constitutes an important area of research. In this contribution, we investigate the applications of AI to the socioeconomically relevant infrastructure of water distribution systems (WDSs). We propose an appropriate definition of protected groups in WDSs and generalized definitions of group fairness, applicable even to multiple non-binary sensitive features, that provably coincide with existing definitions for a single binary sensitive feature. We demonstrate that typical methods for the detection of leakages in WDSs are unfair in this sense. Further, we thus propose a general fairness-enhancing framework as an extension of the specific leakage detection pipeline, but also for an arbitrary learning scheme, to increase the fairness of the AI-based algorithm. Finally, we evaluate and compare several specific instantiations of this framework on a toy and on a realistic WDS to show their utility.
https://doi.org/10.7717/peerj-cs.2317Je mehr KI-gestützte Entscheidungen das Leben von Menschen betreffen, desto wichtiger ist die Fairness solcher Entscheidungen. Dieser Beitrag gibt eine Einführung in die Forschung zu Fairness in KI-Systemen, erklärt die wesentlichen Fairness-Definitionen und Strategien zur Erreichung von Fairness anhand konkreter Beispiele und ordnet die Fairness-Forschung in den europäischen Kontext ein. Weder in der europäischen Gesetzgebung noch in der KI-Forschung kommt es dabei zu einem Konsens, wie Fairness zu definieren und zu erreichen ist. Stattdessen muss für jedes System eine differenzierte und kontextabhängige Betrachtung von möglichen unfairen Ergebnissen und deren Konsequenzen erfolgen. Dieser Beitrag kann dabei unterstützen und richtet sich an ein interdisziplinäres Publikum, indem auf mathematische Formulierungen verzichtet wird und stattdessen Visualisierungen und Beispiele genutzt werden.
https://doi.org/10.1007/978-3-658-43816-6_9Facing climate change, the already limited availability of drinking water will decrease in the future, rendering drinking water an increasingly scarce resource. Considerable amounts of it are lost through leakages in water transportation and distribution networks. Thus, anomaly detection and localization, in particular for leakages, are crucial but challenging tasks due to the complex interactions and changing demands in water distribution networks. In this work, we conceptually analyze the effects of anomalies on the dynamics of critical infrastructure systems by modeling them with Bayesian networks. We then discuss how the problem is connected to and can be considered through the lens of concept drift. This analysis yields our proposal to leverage model-based drift explanations as a tool for localizing anomalies given limited information about the network. The methodology is experimentally evaluated using realistic benchmark scenarios. To showcase that our methodology applies to critical infrastructure more generally, in addition to considering leakages and sensor faults in water systems, we investigate the suitability of the derived technique to localize sensor faults in power systems.
https://doi.org/10.1109/IJCNN60899.2024.10651472Research on methods for planning and controlling water distribution networks gains increasing relevance as the availability of drinking water will decrease as a consequence of climate change. So far, the majority of approaches is based on hydraulics and engineering expertise. However, with the increasing availability of sensors, machine learning techniques constitute a promising tool. This work presents the main tasks in water distribution networks, discusses how they relate to machine learning and analyses how the particularities of the domain pose challenges to and can be leveraged by machine learning approaches. Besides, it provides a technical toolkit by presenting evaluation benchmarks and a structured survey of the exemplary task of leakage detection and localization.
https://doi.org/10.1007/978-3-031-72356-8_11Feature selection is one of the most relevant preprocessing and analysis techniques in machine learning. It can dramatically increase the performance of learning algorithms and at the same time provide relevant information on the data. In the scenario of online and stream learning, concept drift, i.e., changes of the underlying distribution over time, can cause significant problems for learning models and data analysis. While there do exist feature selection methods for online learning, none of the methods targets feature selection for drift detection, i.e., the challenge to increase the performance of drift detectors by analyzing the drift rather than increasing model accuracy. However, this challenge is particularly relevant for common unsupervised scenarios. In this work, we study feature selection for drift detection and drift monitoring. We develop a formal definition for a feature-wise notion of drift that allows semantic interpretation. Besides, we derive an efficient algorithm by reducing the problem to classical feature selection and analyze the applicability of our approach to feature selection for drift detection on a theoretical level. Finally, we empirically show the relevance of our considerations on several benchmarks.
https://doi.org/10.1016/j.neucom.2024.127968The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this study, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on unsupervised drift detection. In addition to providing a comprehensive literature review, it offers precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different detection strategies. Thus, the suitability of different schemes can be analyzed systematically, and guidelines for their usage in real-world scenarios can be provided.
https://doi.org/10.3389/frai.2024.1330257In an increasing number of industrial and technical processes, machine learning-based systems are being entrusted with supervision tasks. While they have been successfully utilized in many application areas, they frequently are not able to generalize to changes in the observed data, which environmental changes or degrading sensors might cause. These changes, commonly referred to as concept drift can trigger malfunctions in the used solutions which are safety-critical in many cases. Thus, detecting and analyzing concept drift is a crucial step when building reliable and robust machine learning-driven solutions. In this work, we consider the setting of unsupervised data streams which is highly relevant for different monitoring and anomaly detection scenarios. In particular, we focus on the tasks of localizing and explaining concept drift which are crucial to enable human operators to take appropriate action. Next to providing precise mathematical definitions of the problem of concept drift localization, we survey the body of literature on this topic. By performing standardized experiments on parametric artificial datasets we provide a direct comparison of different strategies. Thereby, we can systematically analyze the properties of different schemes and suggest first guidelines for practical applications. Finally, we explore the emerging topic of explaining concept drift.
https://doi.org/10.3389/frai.2024.1330258Drinking water is a vital resource for humanity, and thus, Water Distribution Networks (WDNs) are considered critical infrastructures in modern societies. The operation of WDNs is subject to diverse challenges such as water leakages and contamination, cyber/physical attacks, high energy consumption during pump operation, etc. With model-based methods reaching their limits due to various uncertainty sources, AI methods offer promising solutions to those challenges. In this work, we introduce a Python toolbox for complex scenario modeling \& generation such that AI researchers can easily access challenging problems from the drinking water domain. Besides providing a high-level interface for the easy generation of hydraulic and water quality scenario data, it also provides easy access to popular event detection benchmarks and an environment for developing control algorithms.
https://doi.org/10.48550/arXiv.2406.02078Concept drift, i.e., the change of the data generating distribution, can render machine learning models inaccurate. Several works address the phenomenon of concept drift in the streaming context usually assuming that consecutive data points are independent of each other. To generalize to dependent data, many authors link the notion of concept drift to time series. In this work, we show that the temporal dependencies are strongly influencing the sampling process. Thus, the used definitions need major modifications. In particular, we show that the notion of stationarity is not suited for this setup and discuss an alternative we refer to as consistency. We demonstrate that consistency better describes the observable learning behavior in numerical experiments.
https://doi.org/10.1007/978-3-031-58547-0_7Leakages are a major risk in water distribution networks as they cause water loss and increase contamination risks. Leakage detection is a difficult task due to the complex dynamics of water distribution networks. In particular, small leakages are hard to detect. From a machine-learning perspective, leakages can be modeled as concept drift. Thus, a wide variety of drift detection schemes seems to be a suitable choice for detecting leakages. In this work, we explore the potential of model-loss-based and distribution-based drift detection methods to tackle leakage detection. We additionally discuss the issue of temporal dependencies in the data and propose a way to cope with it when applying distribution-based detection. We evaluate different methods systematically for leakages of different sizes and detection times. Additionally, we propose a first drift-detection-based technique for localizing leakages.
https://doi.org/10.5220/0012361200003654Water distribution systems (WDS) are an integral part of critical infrastructure which is pivotal to urban development. As 70% of the world's population will likely live in urban environments in 2050, efficient simulation and planning tools for WDS play a crucial role in reaching UN's sustainable developmental goal (SDG) 6 - "Clean water and sanitation for all". In this realm, we propose a novel and efficient machine learning emulator, more precisely, a physics-informed deep learning (DL) model, for hydraulic state estimation in WDS. Using a recursive approach, our model only needs a few graph convolutional neural network (GCN) layers and employs an innovative algorithm based on message passing. Unlike conventional machine learning tasks, the model uses hydraulic principles to infer two additional hydraulic state features in the process of reconstructing the available ground truth feature in an unsupervised manner. To the best of our knowledge, this is the first DL approach to emulate the popular hydraulic simulator EPANET, utilizing no additional information. Like most DL models and unlike the hydraulic simulator, our model demonstrates vastly faster emulation times that do not increase drastically with the size of the WDS. Moreover, we achieve high accuracy on the ground truth and very similar results compared to the hydraulic simulator as demonstrated through experiments on five real-world WDS datasets.
https://doi.org/10.1609/aaai.v38i20.30192Long-term water network planning methods need to be adaptive under deep uncertainty.
Reinforcement learning (RL) is a promising approach for decision-making under uncertainty.
We propose the application of reinforcement learning for the design of water networks.
Results show that an RL agent can find feasible solutions to deterministic problems.
This is a first step towards the development of more adaptive planning approaches in the field.
https://virtual.oxfordabstracts.com/#/event/3937/submission/54There is an emerging need for predictive models to be trained on-the-fly, since in numerous machine learning applications data are arriving in an online fashion. A critical challenge encountered is that of limited availability of ground truth information (e.g., labels in classification tasks) as new data are observed one-by-one online, while another significant challenge is that of class imbalance. This work introduces the novel Augmented Queues method, which addresses the dual-problem by combining in a synergistic manner online active learning, data augmentation, and a multi-queue memory to maintain separate and balanced queues for each class. We perform an extensive experimental study using image and time-series augmentations, in which we examine the roles of the active learning budget, memory size, imbalance level, and neural network type. We demonstrate two major advantages of Augmented Queues. First, it does not reserve additional memory space as the generation of synthetic data occurs only at training times. Second, learning models have access to more labelled data without the need to increase the active learning budget and / or the original memory size. Learning on-the-fly poses major challenges which, typically, hinder the deployment of learning models. Augmented Queues significantly improves the performance in terms of learning quality and speed. Our code is made publicly available.
https://zenodo.org/records/7659977In real-world applications, the process generating the data might suffer from nonstationary effects (e.g., due to seasonality, faults affecting sensors or actuators, and changes in the users' behaviour). These changes, often called concept drift, might induce severe (potentially catastrophic) impacts on trained learning models that become obsolete over time, and inadequate to solve the task at hand. Learning in presence of concept drift aims at designing machine and deep learning models that are able to track and adapt to concept drift. Typically, techniques to handle concept drift are either active or passive, and traditionally, these have been considered to be mutually exclusive. Active techniques use an explicit drift detection mechanism, and re-train the learning algorithm when concept drift is detected. Passive techniques use an implicit method to deal with drift, and continually update the model using incremental learning. Differently from what present in the literature, we propose a hybrid alternative which merges the two approaches, hence, leveraging on their advantages. The proposed method called Hybrid-Adaptive REBAlancing (HAREBA) significantly outperforms strong baselines and state-of-the-art methods in terms of learning quality and speed; we experiment how it is effective under severe class imbalance levels too.
https://ieeexplore.ieee.org/document/10022140In our digital universe nowadays, enormous amount of data are produced in a streaming manner in a variety of application areas. These data are often unlabelled. In this case, identifying infrequent events, such as anomalies, poses a great challenge. This problem becomes even more difficult in non-stationary environments, which can cause deterioration of the predictive performance of a model. To address the above challenges, the paper proposes an autoencoder-based incremen-tal learning method with drift detection (strAEm++DD). Our proposed method strAEm++DD leverages on the advantages of both incremental learning and drift detection. We conduct an experimental study using real-world and synthetic datasets with severe or extreme class imbalance, and provide an empirical analysis of strAEm++DD. We further conduct a comparative study, showing that the proposed method significantly outper-forms existing baseline and advanced methods.
https://ieeexplore.ieee.org/document/10191328A significant challenge when attempting to regulate the spatial-temporal concentration of a disinfectant in a water distribution network is the large and uncertain delay between the time that the chemical is injected at the input node and the time that the concentration is measured at the monitoring output nodes. Uncertain time delays are due to varying water flows, which depend mainly on consumer water demands. Existing approaches cannot guarantee that the concentration of the disinfectant will remain within a specified range at the output, even though bounds on time-delay uncertainty may be known. In this work, given bounded water-flow uncertainty, we use the input–output modeling approach to develop a disinfectant scheduling methodology that guarantees a bounded output disinfectant concentration. The proposed methodology creates an input–output model uncertainty characterization by utilizing estimated bounds on water-quality states using the backtracking approach. An optimization problem is formulated and solved to find an input schedule that keeps the disinfectant concentration within predefined bounds for a specified time horizon. Simulation results in two case studies where water demands varied between ±20% of their nominal value show that the proposed scheduler is able to avoid lower bound violations of disinfectant concentration.
https://iwaponline.com/jh/article/26/2/386/99925/Disinfection-scheduling-in-water-distributionThis study delves into the differences between incremental and optimized network design, with a focus on tree-shaped water distribution networks (WDNs). The study evaluates the cost overhead of incremental design under two distinct expansion models: random and gradual. Our findings reveal that while incremental design does incur a cost overhead, this overhead does not increase significantly as the network expands, especially under gradual expansion. We also evaluate the cost overhead for the two tree-shaped WDNs of a city in Cyprus. The paper underscores the need to consider the evolution of infrastructure networks, answering key questions about cost overhead, scalability, and design efficacy.
https://link.springer.com/chapter/10.1007/978-3-031-53503-1_21Water distribution systems are susceptible to contamination events, which can occur due to naturally occurring events, accidents or even malicious attacks. When a contamination event occurs, dangerous substances infiltrating the network may be consumed thereby deteriorating the consumers’ health and possibly affecting the economy. Advances in sensor and actuator technologies are enabling water networks to become smarter and more resilient to these types of events. This paper provides a broad review of the theoretical, modeling, and computational developments in the area of contamination event diagnosis for water distribution systems. Research is segmented into three main tasks, summarized as “Preparedness”, “Event Detection and Isolation” and “Emergency Event Management”. The key research topics from each task are described within a unified systems-theoretic mathematical framework, and their open challenges are discussed.
https://www.sciencedirect.com/science/article/pii/S1367578823000159Water distribution networks (WDNs) with other infrastructures constitute a complex and interdependent multi-utility system. Considering interdependencies between WDNs and other urban infrastructures, this work proposes WDN intervention planning using a dynamic multi-utility approach to tackle the challenges of pressure deficits and cascading failures by the decoupling of different infrastructure systems. For this purpose, the study develops reliability indices representing the hydraulic and decoupled statuses of WDNs with neighbor infrastructures; the hydraulic reliability represents the robustness of the network against the water pressure deficit, and decoupling reliability represents the extent to which WDN elements are decoupled from other assets elements. A multi-objective optimization algorithm is employed to develop rehabilitation strategies by introducing three approaches for WDN upgrade following a phased design and construction method. Evaluating intervention plans based on construction cost, reliability and cascade effects shows that, under budget limitation conditions, decoupling a WDN could significantly save the cascade cost such that 1% improvement in the decoupling reliability brings about 157.42 billion Rials cascade cost saving to asset managers. On the other hand, the decoupled network is weak against hydraulic reliability, which could make it by far less resilient network than the coupled network with around 75% hydraulic reliability difference.
https://iwaponline.com/jh/article/25/5/2084/97296/Optimal-rehabilitation-planning-for-aged-waterIdentifying mechanisms of real-life human decision-making is central to inform effective, human-centric public policy. Here, we report larger trends and synthesize preliminary lessons from behavioral economic and neuro-economic investigations focusing on environmental values. We review the currently available evidence at different levels of granularity, from insights into how individuals value natural resources (individual level), evidence from work on group externalities, common pool resources, and social norms (social group level) to the study of incentives, policies, and their impact (institutional level). At each level, we identify viable directions for future scientific research and actionable items for policy-makers. Coupled with new technological and methodological advances, we suggest that behavioral economic and neuroeconomic insights may inform an effective strategy to optimize environmental resources. We conclude that the time is ripe for action to enrich policies with scientifically grounded insights, making an impact in the interest of current and future generations.
https://www.annualreviews.org/doi/full/10.1146/annurev-resource-101722-082743Counterfactual explanations (CFEs) are a popular approach in explainable artificial intelligence (xAI), highlighting changes to input data necessary for altering a model’s output. A CFE can either describe a scenario that is better than the factual state (upward CFE), or a scenario that is worse than the factual state (downward CFE). However, potential benefits and drawbacks of the directionality of CFEs for user behavior in xAI remain unclear. The current user study (N = 161) compares the impact of CFE directionality on behavior and experience of participants tasked to extract new knowledge from an automated system based on model predictions and CFEs. Results suggest that upward CFEs provide a significant performance advantage over other forms of counterfactual feedback. Moreover, the study highlights potential benefits of mixed CFEs improving user performance compared to downward CFEs or no explanations. In line with the performance results, users’ explicit knowledge of the system is statistically higher after receiving upward CFEs compared to downward comparisons. These findings imply that the alignment between explanation and task at hand, the so-called regulatory fit, may play a crucial role in determining the effectiveness of model explanations, informing future research directions in (xAI). To ensure reproducible research, the entire code, underlying models and user data of this study is openly available: https://github.com/ukuhl/DirectionalAlienZoo
https://doi.org/10.1007/978-3-031-44070-0_14Concept drift refers to a change in the data distribution affecting the data stream of future samples. Consequently, learning models operating on the data stream might become obsolete, and need costly and difficult adjustments such as retraining or adaptation. Existing methods usually implement a local concept drift adaptation scheme, where either incremental learning of the models is used, or the models are completely retrained when a drift detection mechanism triggers an alarm. This paper proposes an alternative approach in which an unsupervised and model-agnostic concept drift adaptation method at the global level is introduced, based on autoencoders. Specifically, the proposed method aims to “unlearn” the concept drift without having to retrain or adapt any of the learning models operating on the data. An extensive experimental evaluation is conducted in two application domains. We consider a realistic water distribution network with more than 30 models in-place, from which we create 200 simulated data sets / scenarios. We further consider an image-related task to demonstrate the effectiveness of our method.
https://doi.org/10.1109/SSCI52147.2023.10372001Hyperspectral imaging is a suitable measurement tool across domains. However, when combined with machine learning techniques, frequently intensity and transversal shifts hinder the transfer between different sensors and settings. Established approaches focus on eliminating sensor shifts in the data or recalibrating sensors. In this contribution, we target the training procedure, propose robust training, and derive a robust feature selection strategy that can cope with multiple shift dynamics at the same time. We evaluate our approaches experimentally on artificial and real-world datasets.
https://www.esann.org/sites/default/files/proceedings/2023/ES2023-158.pdfConcept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models can become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift, i.e., describing the potentially complex and high dimensional change of distributions in a human-understandable fashion, has hardly been considered so far. This problem is of importance since it enables an inspection of the most prominent characteristics of how and where drift manifests. Hence, it allows human understanding of the change and it increases acceptance of life-long learning models. In this paper, we present a novel technology characterizing concept drift in terms of the characteristic change of spatial features based on various explanation techniques. To do so, we propose a methodology to reduce the explanation of concept drift to an explanation of models that are trained in a suitable way to extract relevant information from the drift. This way, a large variety of explanation schemes is available, and a suitable method can be selected for the problem at hand. We outline the potential of this approach and demonstrate its usefulness in several examples.
https://doi.org/10.1016/j.neucom.2023.126640This paper introduces EPyT, an open-source Python package for providing a Python-based programming interface with the open-source hydraulic and quality modeling software EPANET, created by the US Environmental Protection Agency. EPyT extends the standard capabilities of the EPANET library, through the addition of new methods for research purposes. In addition to the extensive Application Programming Interface, EPyT is accompanied by a collection of water distribution benchmarks and more than 25 code examples that researchers can use as a starting point.
https://joss.theoj.org/papers/10.21105/joss.05947#As relevant examples such as the future criminal detection software show, fairness of AI-based and social domain affecting decision support tools constitutes an important area of research. In this contribution, we investigate the applications of AI to socioeconomically relevant infrastructures such as those of water distribution networks (WDNs), where fairness issues have yet to gain a foothold. To establish the notion of fairness in this domain, we propose an appropriate definition of protected groups and group fairness in WDNs as an extension of existing definitions. We demonstrate that typical methods for the detection of leakages in WDNs are unfair in this sense. Further, we thus propose a remedy to increase the fairness which can be applied even to non-differentiable ensemble classification methods as used in this context.
https://github.com/jstrotherm/FairnessInWDNS https://doi.org/10.1007/978-3-031-43085-5_10Many Machine Learning models are vulnerable to adversarial attacks: One can specifically design inputs that cause the model to make a mistake. Our study focuses on adversarials in the security-critical domain of leakage detection in water distribution networks (WDNs). As model input in this application consists of sensor readings, standard adversarial methods face a challenge. They have to create new inputs that still comply with the underlying physics of the network. We propose a novel approach to construct adversarial attacks against Machine Learning based leakage detectors in WDNs. In contrast to existing studies, we use a hydraulic model to simulate leaks in the water network. The adversarial attacks are then constructed based on these simulations, which makes them intrinsically physics-constrained. The adversary maximizes water loss by finding the least sensitive point, that is, the point at which the largest possible undetected leak could occur. We provide a mathematical formulation of the least sensitive point problem together with a taxonomy of adversarials in WDNs, in order to relate our work to other possible approaches in the field. The problem is then solved using three different algorithmic approaches on two benchmark WDNs. Finally, we discuss the results and reflect on potentials to enhance model robustness based on knowledge about adversarial weaknesses.
https://doi.org/10.1007/978-3-031-43078-7_37We investigate the task of missing value estimation in graphs as given by water distribution systems (WDS) based on sparse signals as a representative machine learning challenge in the domain of critical infrastructure. The underlying graphs have a comparably low node degree and high diameter, while information in the graph is globally relevant, hence graph neural networks face the challenge of long term dependencies. We propose a specific architecture based on message passing which displays excellent results for a number of benchmark tasks in the WDS domain. Further, we investigate a multi-hop variation, which requires considerably less resources and opens an avenue towards big WDS graphs.
https://doi.org/10.1007/978-3-031-30047-9_3In many real-world scenarios, data are provided as a potentially infinite stream of samples that are subject to changes in the underlying data distribution, a phenomenon often referred to as concept drift. A specific facet of concept drift is feature drift, where the relevance of a feature to the problem at hand changes over time. High-dimensionality of the data poses an additional challenge to learning algorithms operating in such environments. Common scenarios of this nature can for example be found in sensor-based maintenance operations of industrial machines or inside entire networks, such as power grids or water distribution systems. However, since most existing methods for incremental learning focus on classification tasks, efficient online learning for regression is still an underdeveloped area. In this work, we introduce an extension to the SAM-kNN Regressor that incorporates metric learning in order to improve the prediction quality on data streams, gain insights into the relevance of different input features and based on that, transform the input data into a lower dimension in order to improve computational complexity and suitability for high-dimensional data. We evaluate our proposed method on artificial data, to demonstrate its applicability in various scenarios. In addition to that, we apply the method to the real-world problem of water distribution network monitoring. Specifically, we demonstrate that sensor faults in the water distribution network can be detected by monitoring the feature relevances computed by our algorithm.
https://doi.org/10.1080/08839514.2023.2198846Introduction
To foster usefulness and accountability of machine learning (ML), it is essential to explain a model's decisions in addition to evaluating its performance. Accordingly, the field of explainable artificial intelligence (XAI) has resurfaced as a topic of active research, offering approaches to address the “how” and “why” of automated decision-making. Within this domain, counterfactual explanations (CFEs) have gained considerable traction as a psychologically grounded approach to generatepost-hocexplanations. To do so, CFEs highlight what changes to a model's input would have changed its prediction in a particular way. However, despite the introduction of numerous CFE approaches, their usability has yet to be thoroughly validated at the human level.
Methods
To advance the field of XAI, we introduce the Alien Zoo, an engaging, web-based and game-inspired experimental framework. The Alien Zoo provides the means to evaluate usability of CFEs for gaining new knowledge from an automated system, targeting novice users in a domain-general context. As a proof of concept, we demonstrate the practical efficacy and feasibility of this approach in a user study.
Results
Our results suggest the efficacy of the Alien Zoo framework for empirically investigating aspects of counterfactual explanations in a game-type scenario and a low-knowledge domain. The proof of concept study reveals that users benefit from receiving CFEs compared to no explanation, both in terms of objective performance in the proposed iterative learning task, and subjective usability.
We have witnessed in recent years an ever-growing volume of information becoming available in a streaming manner in various application areas. As a result, there is an emerging need for online learning methods that train predictive models on-the-fly. A series of open challenges, however, hinder their deployment in practice. These are, learning as data arrive in real-time one-by-one, learning from data with limited ground truth information, learning from nonstationary data, and learning from severely imbalanced data, while occupying a limited amount of memory for data storage. We propose the ActiSiamese algorithm, which addresses these challenges by combining online active learning, siamese networks, and a multi-queue memory. It develops a new density-based active learning strategy which considers similarity in the latent (rather than the input) space. We conduct an extensive study that compares the role of different active learning budgets and strategies, the performance with/without memory, the performance with/without ensembling, in both synthetic and real-world datasets, under different data nonstationarity characteristics and class imbalance levels. ActiSiamese outperforms baseline and state-of-the-art algorithms, and is effective under severe imbalance, even only when a fraction of the arriving instances’ labels is available. We publicly release our code to the community.
https://www.sciencedirect.com/science/article/pii/S0925231222011481A key challenge in designing algorithms for leakage detection and isolation in drinking water distribution systems is the performance evaluation and comparison between methodologies using benchmarks. For this purpose, the Battle of the Leakage Detection and Isolation Methods (BattLeDIM) competition was organized in 2020 with the aim to objectively compare the performance of methods for the detection and localization of leakage events, relying on supervisory control and data acquisition (SCADA) measurements of flow and pressure sensors installed within a virtual water distribution system. Several teams from academia and the industry submitted their solutions using various techniques including time series analysis, statistical methods, machine learning, mathematical programming, met-heuristics, and engineering judgment, and were evaluated using realistic economic criteria. This paper summarizes the results of the competition and conducts an analysis of the different leakage detection and isolation methods used by the teams. The competition results highlight the need for further development of methods for leakage detection and isolation, and also the need to develop additional open benchmark problems for this purpose.
https://ascelibrary.org/doi/full/10.1061/%28ASCE%29WR.1943-5452.0001601Numerical optimization is gradually finding its way into drinking water practice. For successful introduction of optimization into the sector, it is important that researchers and utility experts work together on the problem formulation with the water utility experts. Water utilities heed the solutions provided by optimization techniques only when the underlying approach and performance criteria match their specific goals. In this contribution, we demonstrate the application of numerical optimization on a real-life problem. The Belgian utility De Watergroep is looking to not only reinforce its distribution networks but to also structurally modify the network’s topology to enhance the quality of water delivered in the future. To help the utility explore the possibilities of these far-reaching changes in the most flexible way possible, an optimization problem was formulated to optimize topology and pipe sizing simultaneously for the distribution network of a Belgian city. The objective of the problem is to minimize the volume of the looped network and thereby work towards a situation where most of the customers are fed by branched extremities of the network. This objective is constrained by pressure and fire flow requirements and thresholds on the number of customers on the branched sections. The requirements for continuity of supply under failure scenarios are guaranteed by these constraints, as verified in the final solution. The results of the optimization process show that it is possible to design a network which is 18.5% cheaper than the currently existing network. Moreover, it turns out the—previously completely meshed—topology can be restructured so that 67% of the network length is turned into branched clusters, with a meshed superstructure of 33% of the length remaining.
View PublicationWater distribution networks (WDNs) evolve continuously over time. Changes in water
demands and pipe deterioration require construction upgrades to be performed on the
network during its entire lifecycle. However, strategically planning WDNs, especially for the
long term, is a challenging task. This is because parameters that are essential for the
description of WDNs in the future, such as climate, population and demand transitions, are
characterized by deep uncertainty. To cope with future uncertainty, and avoid overdesign or
costly unplanned and reactive interventions, research is moving away from the static design
of WDNs. Dynamic design approaches, aim to make water networks adaptive to changing
conditions over long planning horizons. A promising, dynamic design approach is the staged
design of WDNs, in which the planning horizon is divided into construction phases. This
approach allows short-term interventions to be made, while simultaneously considering the
expected long-term network growth outcomes. The aim of this paper is to summarize the
current state of the art in staged design of water distribution networks. To achieve that, we
critically examined relevant publications and classified them according to their shared key
characteristics, such as the nature of the design problem (new or existing network design,
expansion, strengthening, and rehabilitation), problem formulation (objective functions,
length of planning horizon), optimization method, and uncertainty considerations. In the
process, we discuss the latest findings in the literature, highlight the major contributions of
staged design on water distribution networks, and suggest future research directions.
View PublicationThe percentage of the world population living in urban settlements is expected to increase to
70% of 9.7 billion by 2050. Historically, as cities grew, the development of new water
infrastructures followed as needed. However, these developments had less to do with real
planning than with reacting to crisis situations and urgent needs, due to the inability of urban
water planners to consider long-term, deeply uncertain and ambiguous factors affecting urban
development and water demand. The “Smart Water Futures: Designing the Next Generation of
Urban Drinking Water Systems” or “Water-Futures” project, which was funded by the
European Research Council (ERC), aims to develop a new theoretical framework for the
allocation and development decisions on drinking water infrastructure systems so that they
are: (i) socially equitable, (ii) economically efficient, and (iii) environmentally resilient, as
advocated by the UN Agenda 2030, Sustainable Development Goals. The ERC Synergy grant
project tackles the “wicked problem” of transitioning water distribution systems in a holistic
manner, involving civil engineering, control engineering, machine learning, decision theory
and environmental economics expertise. Developing a theoretical foundation for designing
smart water systems that can deliver optimally robust and resilient decisions for short/long-
term planning is one of the biggest challenges that future cities will be facing. This paper
presents an overview of related past research on this topic, the knowledge gaps in terms of
investigating the problem in a holistic manner, and the key early outcomes of the project.
View PublicationVaquet V., Artelt A., Brinkrolf J. and Hammer B., "Taking Care of Our Drinking Water: Dealing with Sensor Faults in Water Distribution Networks", ICANN 2022
The water supply is part of the critical infrastructure as the accessibility of clean drinking water is essential to ensure the health of the people. To guarantee the availability of fresh water, efficient and reliable water distribution networks are crucial. Monitoring these systems is necessary to avoid deterioration in water quality, deal with leakages and prevent cyber-physical attacks. While the installation of a growing amount of sensors is increasing the possibilities to monitor the system, considering the control of the senors becomes another challenge as sensor faults negatively influence the reliability of systems dealing with leakages and monitoring water quality. In this work, we aim to overcome the negative implications induced by sensor faults by using a sensor fault monitoring system based on three steps. First, established residual based fault detection is applied. In a second step, we extend this method to a fault isolation technique and finally propose fault accommodation by standard imputation techniques and different types of virtual sensors.
View PublicationJakob J., Artelt A., Hasenjäger M. and Hammer B., "SAM-kNN Regressor for Online Learning in Water Distribution Networks", ICANN 2022
Water distribution networks are a key component of modern infrastructure for housing and industry. They transport and distribute water via widely branched networks from sources to consumers. In order to guarantee a working network at all times, the water supply company continuously monitors the network and takes actions when necessary – e.g. reacting to leakages, sensor faults and drops in water quality. Since real world networks are too large and complex to be monitored by a human, algorithmic monitoring systems have been developed. A popular type of such systems are residual based anomaly detection systems that can detect events such as leakages and sensor faults. For a continuous high quality monitoring, it is necessary for these systems to adapt to changed demands and presence of various anomalies.
In this work, we propose an adaption of the incremental SAM-kNN classifier for regression to build a residual based anomaly detection system for water distribution networks that is able to adapt to any kind of change.
View PublicationArtelt A., Vrachimis S., Eliades D., Polycarpou M. and Hammer B., "One Explanation to Rule them All -- Ensemble Consistent Explanations", XAI workshop at IJCAI 2022
Transparency is a major requirement of modern AI based decision making systems deployed in real world. A popular approach for achieving transparency is by means of explanations. A wide variety of different explanations have been proposed for single decision making systems. In practice it is often the case to have a set (i.e. ensemble) of decisions that are used instead of a single decision only, in particular in complex systems. Unfortunately, explanation methods for single decision making systems are not easily applicable to ensembles -- i.e. they would yield an ensemble of individual explanations which are not necessarily consistent, hence less useful and more difficult to understand than a single consistent explanation of all observed phenomena. We propose a novel concept for consistently explaining an ensemble of decisions locally with a single explanation -- we introduce a formal concept, as well as a specific implementation using counterfactual explanations.
View PublicationPittis N., Koundouri P., Samartzis P., Englezos N. and Papandreou A., "Ambiguity aversion, modern Bayesianism and small worlds" [version 1; peer review: 2 approved], Open Research Europe 2021, 1:13
The central question of this paper is whether a rational agent under uncertainty can exhibit ambiguity aversion (AA). The answer to this question depends on the way the agent forms her probabilistic beliefs: classical Bayesianism (CB) vs modern Bayesianism (MB). We revisit Schmeidler's coin-based example and show that a rational MB agent operating in the context of a "small world", cannot exhibit AA. Hence we argue that the motivation of AA based on Schmeidler's coin-based and Ellsberg's classic urn-based examples, is poor, since they correspond to cases of "small worlds". We also argue that MB, not only avoids AA, but also proves to be normatively superior to CB because an MB agent (i) avoids logical inconsistencies akin to the relation between her subjective probability and objective chance, (ii) resolves the problem of "old evidence" and (iii) allows psychological detachment from actual evidence, hence avoiding the problem of "cognitive dissonance". As far as AA is concerned, we claim that it may be thought of as a (potential) property of large worlds, because in such worlds MB is likely to be infeasible.
View PublicationAlamanos, A.; Koundouri, P.; Papadaki, L.; Pliakou, T.; Toli, E. Water for Tomorrow: A Living Lab on the Creation of the Science-Policy-Stakeholder Interface. Water 2022, 14, 2879.
The proactive sustainable management of scarce water across vulnerable agricultural areas of South Europe is a timely issue of major importance, especially under the recent challenges affecting complex water systems. The Basin District of Thessaly, Greece’s driest rural region, has a long history of multiple issues of an environmental, planning, economic or administrative nature, as well as a history of conflict. For the first time, the region’s key-stakeholders, including scientists and policymakers, participated in tactical meetings during the 19-month project “Water For Tomorrow”. The goal was to establish a common and holistic understanding of the problems, assess the lessons learned from the failures of the past and co-develop a list of policy recommendations, placing them in the broader context of sustainability. These refer to enhanced and transparent information, data, accountability, cooperation/communication among authorities and stakeholders, capacity building, new technologies and modernization of current practices, reasonable demand and supply management, flexible renewable energy portfolios and circular approaches, among others. This work has significant implications for the integrated water resources management of similar south-European cases, including the Third-Cycle of the River Basin Management Plans and the International Sustainability Agendas.
View PublicationCurrently, in the water distribution systems literature, fault detection methods are typically evaluated on benchmark water networks that do not include real-time experimental data, or on private commercial datasets, which prohibit the reproducibility of the results. Moreover, realistic modeling of faults on hydraulic system components, sensors and actuators is often unavailable. In this work, we provide a framework for the application of fault-diagnosis methodologies on WaterSafe, a water network benchmark for fault diagnosis. The WaterSafe benchmark is a small scale replica of a water transport network constructed using industrial components and devices, while the communications are implemented in a way that resemble a water utility's Supervisory Control and Data Acquisition system. A general problem formulation for fault-diagnosis on water systems is provided, in accordance to the mathematical model of the benchmark. Moreover, we provide a calibrated simulation model including system, sensor and actuator faults, based on observations from the real system. Finally, we provide open access to the datasets generated from the experiments containing the aforementioned faults.
https://www.sciencedirect.com/science/article/pii/S2405896322005870Koundouri, P., Papayiannis, G. I., Petracou, E. V., & Yannacopoulos, A. N. (2024). Consensus Group Decision making under model uncertainty with a view towards environmental policy making. Environmental and Resource Economics.
In this paper we propose a consensus group decision making scheme under model uncertainty consisting of an iterative two-stage procedure based on the concept of Fréchet barycenter. Each stage consists of two steps: the agents first update their position in the opinion metric space adopting a local barycenter characterized by the agents’ immediate interactions and then a moderator makes a proposal in terms of a global barycenter, checking for consensus at each stage. In cases of large heterogeneous groups, the procedure can be complemented by an auxiliary initial homogenization stage, consisting of a clustering procedure in opinion space, leading to large homogeneous groups for which the aforementioned procedure will be applied. The scheme is illustrated in examples motivated from environmental economics.
View PublicationKoundouri, P., Alamanos, A., Plataniotis, A., Stavridis, C., Perifanos, K., & Devves, S. (2024). Assessing the sustainability of the European Green Deal and its Interlin Kages with the sdgs. NATURE Climate Action, 3(1).
The European Green Deal (EGD) is the growth strategy for Europe, covering multiple domains, and aiming to an equitable, climate neutral European Union by 2050. The UN Agenda 2030, encompassing 17 Sustainable Development Goals (SDGs), establishes the foundation for a global sustainability transition. The integration of the SDGs into the EGD is an overlooked issue in the literature, despite Europe’s slow progress to achieve the sustainability targets. We employed a machine-learning text-mining method to evaluate the extent of SDG integration within the 74 EGD policy documents published during 2019–2023. The findings reveal a substantial alignment of EGD policies with SDGs related to clean energy (SDG7), climate action (SDG13), and sustainable consumption and production (SDG12). In contrast, there is a significant underrepresentation in areas related to social issues such as inequalities, poverty, hunger, health, education, gender equality, decent work, and peace, as indicated by lower alignment with SDGs 1, 2, 3, 4, 5, 8, 10, and 16. Temporal trends suggest a marginal increase in the attention given to environmental health (especially water and marine life) and gender equality. Furthermore, we illustrate the alignment of EGD policies with the six essential sustainability transformations proposed by the Sustainable Development Solutions Network (SDSN) in 2019 for the operationalization of the SDGs. The results indicate that besides the prevalence of “Energy Decarbonization and Sustainable Industry”, all areas have received attention, except for the “Health, Wellbeing and Demography”. The findings call for a more integrated approach to address the complete spectrum of sustainability in a balanced manner.
View PublicationKoundouri, P., Halkos, G., Landis, C. F. M., et al. (2023). Ecosystem services valuation for supporting sustainable life below water. NATURE: Sustain Earth Reviews, 6, 19.
The significance of the SDGs lies in their holistic, global and interdisciplinary nature. But this nature at the same time poses significant challenges, as it is difficult to bridge the breadth of different aspects included in the SDGs, such as the environmental and the socio-economic, both in theory, practical application and policymaking. SDG14 on “life below water” is quite a holistic concept as it refers to a natural/environmental system (seas), supporting several marine economic activities and ecosystem values, and associated with strong social and cultural characteristics of the local populations, affecting the ways they manage marine areas. The main challenges for the achievement of a sustainable life below water are analyzed, and ways forward are discussed. Holistic and well-coordinated approaches considering the complex nature of SDG14 are necessary. Moreover, we argue on the role of economic instruments that can bridge environmental and socio-economic aspects, towards more sustainable life below water. In particular, the potential of environmental valuation as a means to better inform SDG policies, is discussed, using the example of SDG14. The currently established frameworks for Country’s Sustainability Reporting, lack metrics focusing on the economic impact of the environment and the ecosystem services’ degradation or restoration rates, including ocean and marine ecosystems. Acknowledging and quantifying the costs and benefits of ocean and marine ecosystems can lead to more effective interventions (such as ocean pollution prevention, climate change mitigation, fishing exploitation, biodiversity and coral reef preservation) and a better understanding of human-environmental dynamics. This, in turn, strengthens coordinated management and cooperation.
View PublicationKoundouri, P. (2023). Urgent call for comprehensive governmental climate action against wildfires in Greece. NATURE: Climate Action, 2(1).
In recent decades, Greece has experienced devastating wildfires, particularly during the summer months. These wildfires have intensified in frequency and severity, largely attributed by experts to the impacts of climate change. Extended periods of drought, soil aridity, persistent heatwaves, and intensified winds have transformed forests into highly vulnerable areas susceptible to even the minor spark. In many cases, the fires have been uncontrollable and can be described as “megafires.” These megafires, are enormous in scale and intensity, and pose an increasingly severe threat to Greece’s landscape, necessitating a comprehensive response to protect both the environment and citizen’s safety and well-being.
This commentary highlights the need for comprehensive governmental climate action in response to Greece’s wildfires. It discusses the destruction of biodiversity and presents a holistic approach to fire management. Collaboration and the SDGs are emphasized as key elements in addressing climate change’s consequences.
View PublicationChatzistamoulou, N., & Koundouri, P. (2024). Is green transition in Europe fostered by energy and environmental efficiency feedback loops? the role of eco-innovation, Renewable Energy and Green Taxation. Environmental and Resource Economics.
Green transition is in the core of the European policy agenda to achieve the ambitious goal of climate neutrality following the launch of the European Green Deal. The cornerstone of the new growth strategy of Europe is resource efciency which focuses on shifting to a more sustainable production paradigm by conserving scarce resources and by prioritizing enhanced environmental performance. Scattered eforts to investigate the drivers of resource efciency measures have shed light on the key drivers, however, those consider resource efciency measures in isolation neglecting for feedback loops infuencing green transition. Therefore, we develop a conceptual framework to study green transition as a system of resource efciency measures afected by feedback loops, path dependence, green technologies, and green policy tools. We mobilize the analysis by devising a unique balanced panel covering the EU-28 from 2010 through 2019, including policy eforts paving the way for green transition. Econometric results based on a system of fractional probit models, indicate that resource efciency measures are intertwined via feedback loops, especially in the case of environmental efciency. Green technologies afect green transition, however, rebound efects emerge in the case of energy efciency. Past performance afects current levels pushing towards divergence. Evidence suggests that green taxation fosters energy efciency whereas hinders environmental efciency. The asymmetric operation of feedback loops and green taxation on energy and environmental efciency highlights that horizontal policies hinder rather than foster green transition. This study contributes to SDGs 7, 12, 13 and 16.
View Publication