A Context-aware Reminding System for Daily Activities of Dementia Patients

Similar documents
Proposed Architecture for U-Healthcare Systems

CANoE: A Context-Aware Notification Model to Support the Care of Older Adults in a Nursing Home

Sensor Assisted Care. Medical Automation Conference December 12, 2008

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 32

RTLS and the Built Environment by Nelson E. Lee 10 December 2010

RFID-based Hospital Real-time Patient Management System. Abstract. In a health care context, the use RFID (Radio Frequency

WARFIGHTER ANALYTICS USING SMARTPHONES FOR HEALTH (WASH) Angelos Keromytis. Proposer s Day 16 May 2017

The following list of research topics is not exhaustive; researcher-initiated proposals are invited in any of these or other topic areas.

Smart Technology for Gesture Recognition using Accelerometer

Nicolas H. Malloy Systems Engineer

and going to medical appointments. 1 This inability to adequately perform ADLs can necessitate institutionalization. In this paper, we describe Automi

IoT-Based Emotion Recognition Robot to Enhance Sense of Community in Nursing Home

Introduction FUJITSU APPROACH FOR TACKLING THE TECHNICAL CHALLENGES RELATED TO THE MANAGEMENT OF EHR

Trends in Family Caregiving and Why It Matters

Process analysis on health care episodes by ICPC-2

AI venture company ExaWizards and INCJ announce investment agreement with the goal of establishing care based on scientifically-backed AI technology

The Verification for Mission Planning System

UNCLASSIFIED. UNCLASSIFIED Army Page 1 of 10 R-1 Line #10

Medication Adherence. Office Staff Training

Initial Pool Process: Resident Interview

Research on Key Technology of Smart Transportation Based on Internet of Things

MEDICAL_MAS: an Agent-Based System for Medical Diagnosis

Mission Command. Lisa Heidelberg. Osie David. Chief, Mission Command Capabilities Division. Chief Engineer, Mission Command Capabilities Division

An Intelligent Knowledge-Based and Customizable Home Care System Framework with Ubiquitous Patient Monitoring and Alerting Techniques

Needs-based population segmentation

Servant Leadership and Technology Approaches within Long-Term Care that Promote Independence

REQUEST FOR WHITE PAPERS BAA TOPIC 4.2.1: ADAPTIVE INTELLIGENT TRAINING TECHNOLOGIES Research and Development for Multi-Agent Tutoring Approaches

What are ADLs and IADLs?

International Journal of Advance Engineering and Research Development

Critique of a Nurse Driven Mobility Study. Heather Nowak, Wendy Szymoniak, Sueann Unger, Sofia Warren. Ferris State University

ICT Use in Family Caregiving of Elderly and Disabled Subjects

SMS in Hospitals. Communicate with all your stakeholders to improve the efficiency and effectiveness of the care you provide

Lessons in Innovation: The SSBN Tactical Control System Upgrade

Avicena Clinical processes driven by an ontology

COMPUTER ASSISTED MEDICAL HEALTH SYSTEM FOR THE BENEFIT OF HARD TO REACH RURAL AREA

Development and Promotion of Nursing-Care Robots

Health Score Prediction using Low-Invasive Sensors

HEALTH WORKFORCE SUPPLY AND REQUIREMENTS PROJECTION MODELS. World Health Organization Div. of Health Systems 1211 Geneva 27, Switzerland

SHIP Project: Simulation and FMEA Results

Fall 2005 Final Project Electronic Etch-A-Sketch

Computer Science Undergraduate Scholarship

Will the Robots take care of Grandma? Jerry A. Jacobs University of Pennsylvania June 2018

Billing, Coding and Reimbursement Guide

FUNCTIONAL DISABILITY AND INFORMAL CARE FOR OLDER ADULTS IN MEXICO

Methicillin resistant Staphylococcus aureus transmission reduction using Agent-Based Discrete Event Simulation

CNA OnSite Series Overview: Understanding Restorative Care Part 1 - Introduction to Restorative Care

Real ROI: Using RTLS to Improve IV Pump Utilization & Save $1M

Choosing a Memory Care Provider Checklist (Part I- Comparing Communities)

Utkarsha Kumbhar *, Vaidehi Gadkari, Rohan Waichal, Prashant Patil ABSTRACT I. INTRODUCTION

Component Description Unit Topics 1. Introduction to Healthcare and Public Health in the U.S. 2. The Culture of Healthcare

SMART HEALTH MONITORING SYSTEM

Contents. Introduction 3. Required knowledge and skills 4. Section One: Knowledge and skills for all nurses and care staff 6

Army Ground-Based Sense and Avoid for Unmanned Aircraft

A wireless arrhythmia detection system, preliminary results from pre-clinical trials

Acute Care Workflow Solutions

1. When will physicians who are not "meaningful" EHR users start to see a reduction in payments?

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Invivo Essential. MRI Patient Monitor

Invivo Expression. MRI Patient Monitoring Systems

A TELEMATIC SYSTEM FOR ONCOLOGY BASED ON ELECTRONIC HEALTH AND PATIENT RECORDS

Software Requirements Specification

GE Healthcare. B40 Patient Monitor Connecting intelligence and care

TSE Chun Yan Chairman, HA Clinical Ethics Committee

A REVIEW OF NURSING HOME RESIDENT CHARACTERISTICS IN OHIO: TRACKING CHANGES FROM

Remote Healthcare Monitoring System

Request for Applications DIGIBIOMARKERS TECHNOLOGY AWARD

Emergency-Departments Simulation in Support of Service-Engineering: Staffing, Design, and Real-Time Tracking

UNCLASSIFIED. UNCLASSIFIED Air Force Page 1 of 15 R-1 Line #32

Development of an Emergency C-Section Facilitator Using a Human-Machine Systems Engineering Approach

Hospital Bed Occupancy Prediction

DEEP LEARNING FOR PATIENT FLOW MALCOLM PRADHAN, CMO

The Patriot Missile Failure

Statistical Portrait of Caregivers in the US Part III: Caregivers Physical and Emotional Health; Use of Support Services and Technology

The Nomad Digital Pen

Inpatient Bed Need Planning-- Back to the Future?

I. SUBJECT: PORTABLE VIDEO RECORDING SYSTEM

What Remains? : A Persuasive Story Telling Game to facilitate Alzheimer patient intake in care homes

Verification of Specifications Data Flow Diagrams (DFD) Summary. Specification. Miaoqing Huang University of Arkansas Spring / 28

Implementation of Automated Knowledge-based Classification of Nursing Care Categories

Health Technology for Tomorrow

Running Head: READINESS FOR DISCHARGE

Moving from Sentinel SuperPro to Sentinel LDK Migration Guide

An overview of the support given by and to informal carers in 2007

Active Stabilization of Firearms by Optical Target Tracking

C4ISR-Med Battlefield Medical Demonstrations and Experiments

Dental Public Health Activity Descriptive Report

Care Plan. I want to be communicated to in a way I can understand. I would like to be able to express my needs and wants

Implementing Monitoring System for Alzheimer in Nigeria: Wireless Sensor Network (WSN) Knowledge Based Perspective

Advance Care Planning: Goals of Care - Calgary Zone

LTSS INNOVATIONS IN THE CURRENT ENVIRONMENT

CareTracker. Assisted Living. Point of Care Workflow Family Communications ADLs

Table of Contents. Foundation: Understand the Basics 4. Tools: Put the Pieces Together 21. Solve: Learn by Example 38. Printable Tools 56

The Concept of C2 Communication and Information Support

1 Publishable summary. 1.1 Description. CAALYX-MV objective is to widely validate an innovative and efficient ICT-based solution focused

For Fusion '98 Conference Proceedings

A web-based service for improving conformance to medication treatment and patient-physician relationship

Deployment of assistive living technology in a nursing home environment: methods and lessons learned

QAPI Making An Improvement

REMOTE PATIENT MONITORING SYSTEM WITH DECISION SUPPORT

Patient Room of the Future

Transcription:

A Context-aware Reminding System for Daily Activities of Dementia Patients Hua Si Seung Jin Kim Nao Kawanishi Hiroyuki Morikawa Department of Frontier Informatics, The University of Tokyo Kashiwanoha 5-1-5, Kashiwa-shi, Chiba, Japan {sihua, nayaksj, river24, mori}@mlab.k.u-tokyo.ac.jp Abstract Older people with dementia often decline in short-term memory and forget what to do next to complete their activities of daily living (ADLs), such as tea-making and toothbrushing. Therefore, they need caregivers to remind they what to do to complete these activities. However, the steady growth of aging population makes the (relatively) shortage of traditional care resources more and more serious. In this paper, we propose a prototype called CoReDA (Contextaware Reminding system for Daily Activities) to help elderly with dementia complete different ADLs instead of caregivers. By using the wireless sensor node - PAVENET, CoReDA can obtain elderly people s information of tool usage in different ADLs. Based on this information, CoReDA uses TD (λ) Q-Learning technique to provide elderly people their personalized guidance to complete ADLs. 1.1. Related works There have been several guidance systems that support ADL completion for elderly with dementia. Boger et al. [1] have developed a planning system to assist hand washing based on Markov Decision Process (MDP), and use a video camera to track user s hands. Pollack et al. [3] uses dynamic Bayesian networks as an underlying model to coordinate preplanned events to ensure the scheduled tasks are executed without interfering. Philipose et al. [2] recognize elderly people s ADLs by using the RFID and probabilistic inference technologies. However, such systems suffer from two problems. First, they are based solely on pre-planned routines of ADLs, without considering different users preferences. Second, these systems are designed for special ADLs, which are difficult to be generalized to new ones. 1.2. Design criteria 1. Introduction The steady growth of aging population and the (relatively) shortage of traditional care resources are placing an unprecedented demand on the emerging technology of ubiquitous computing to help elderly through their activities of daily living (ADLs). One big opportunity is the ubiquitous guidance system assisting elderly with dementia to complete their ADLs. Older people with dementia often decline in short-term memory, and forget how to complete activities, such as tea-making, tooth-brushing and so on. When they encounters difficulties in ADL completion, s/he need caregivers to prompt the next step to progress in the activity. If the level of dementia worsens, caregivers experience greater feelings of burden as a result of increasing demands of caregiving duties. With the assistance of ubiquitous guidance system which can remind elderly instead of them, caregivers burden will be significantly reduced. In order to thoroughly understand the needs of caregivers and care recipients, we cooperate with NPO Nenrin Support, which provides cares to 25 dementia patients, whose ages range from 72 to 91. During our interviews with specialists, caregivers and observations of care recipients, we found two important principles of dementia patients care: 1) keep the dementia patients do ADLs as they did before. Therefore, a guidance system must have the capability to learn different patients routines of ADLs. 2) only minimal prompts should be provided to them. This guarantee the elderly with dementia will try their best to exercise their brains and delay the deterioration of their dementia. Another requirement from caregiver is that the explicit feedback from caregivers and care recipients are not desirable. According to the problems of previous works and the requirements of caregivers, we consider the following criteria are important for designing of our system: It should detect the user s process through their ADLs.

It should learn and provide personalized guidance to different users. It should provide the minimal prompt the user need. Time (s) ADL Step Reminding 0 It should easily generalize to other ADLs. It should operate without explicit feedback from care recipients or caregivers. 1.3. Overview of CoReDA 8 13 19 (1) (4) 1. "Please use electronic-pot." 2. Red LED on teacup 3. Green LED on pot 4. Image of pot is shown. According to the five criteria mentioned above, we propose a ubiquitous ADL guidance system called CoReDA (Context-aware Reminding system for Daily Activities) to help elderly with dementia complete different ADLs instead of caregivers. CoReDA can obtain elderly people s information of tool usage in the process through their ADLs by using the wireless sensor node - PAVENET [5], which can easily generalize to other ADLs. Based on the tool usage information, CoReDA uses TD (λ) Q-Learning technique to learn different users routines of ADLs and provide elderly personalized and minimal guidance for ADL completion. Since Q-Learning has a reward mechanism, it does not require explicit feedback from care recipients or caregivers. For elderly with dementia, a typical scenario of CoReDA is shown in Figure 1. Mr. Tanaka always makes tea in four steps: 1) takes tea-leaf from tea-box and puts them into kettle, 2) pours hot water from electronic-pot into kettle, 3) pours tea into tea-cup and 4) drinks a cup of tea. CoReDA monitors his usage of tools in each step by analyzing sensor data from PAVENET, which is attached to every tool. Based on the tool usage information, CoReDA uses Q- Learning technique to learn Tanaka s personalized routine of tea-making. When Tanaka s dementia becomes worse, he may incorrectly take the tea-cup after 1) putting tea-leaf into kettle. In this case, CoReDA will prompt him to pour hot water from electronic-pot by using the four methods shown in Figure 1 (Time: 13s). When Mr. Tanaka correctly use the electronic-pot, he will be praised (Time: 23s). If he forgets what to do after 3) pours tea into tea-cup, and does not do anything for 30 seconds 1, CoReDA will prompt him to drink a cup of tea by using the three methods shown in Figure 1 (Time: 71s). For different users and ADLs, CoReDA can learn different routines of them. The rest of this paper is organized as following: the architecture and implementation of CoReDA is explained in section 2. The preliminary evaluation of CoReDA is discussed in section 3. Conclusions and future work will be given in section 4. 1 30s is just an example here. This time should be determined from the statistical data of how long a user will use this tool. 23 33 36 41 71 76 96 (2) (3) Do not do anything for 30s (4) 1. "Excellent!" 1. "Please use teacup" 2. Green LED on teacup 3. Image of teacup is shown. 1. "Excellent!" Figure 1. A typical scenario of CoReDA 2. Architecture of CoReDA As depicted in Figure 2, CoReDA consists of three subsystems: sensing, planning and reminding. 2.1. Sensing subsystem The sensing subsystem extracts user s current step of ADLs by detecting the usage of tools from sensor nodes attached to them. For instance, in the tea-making scenario, we attach PAVENET with 3-axis accelerometer to tea-box, kettle and tea-cup, and PAVENET with pressure sensor to electronic-pot. When a tool is used, its ID will be sent to the server, from which we can extract StepID, which indicates the current step of ADL. The sequence of StepID is stored for planning subsystem. We implement sensing subsystem on PAVENET (Table

Sensing Subsystem Current Tool ID Tool Usage History Data Tool ID Sensor Data Planning Subsystem (Q-Learning) Forget Next Step? Reminding Subsystem Next Tool ID Reminding Level LED Blinking Text Message Tool Picture Table 1. Hardware of PAVENET CPU Microchip PIC18LF4620 RAM 4 KB ROM 64 KB Wireless ChipCon CC1000 I/O UART, GPIO,I 2 C Peripherals Four LEDs, Real Time Clock, External EEPROM(16 KB) Sensors 3-axis accelerometer, Pressure, Brightness, Temperature, Motion The Care Recipient The Process of Tea-making Figure 2. Architecture of CoReDA Wireless Sensor Node (PAVENET module) 1), which is attached to each tool. For each tool, its ID and the usage information are the most important. We use the uid (unique ID) of PAVENET as the ID of the tool which it is attached to. The StepID is defined as the ID of the tool which is mainly used in this step. We also define a StepID 0 to indicate nothing is done for a long time. The usage of tool is detected from sensor data of PAVENET. Table 2 shows the sensors and tools used in two ADLs: Toothbrushing and Tea-making. The sampling rate of each sensor is 10 times in one second. If three of these 10 samples surpass a pre-defined threshold, the tool will be considered is using, and its ID will be sent to the server. We use this mechanism to protect detection against accidental operation. Since the programs on different PAVENETs are almost the same, it is very convenient to generalize the sensing subsystem to other ADLs. What we need do is only attach one PAVENET to a tool, and configure its uid as the tool ID. 2.2. Planning subsystem The planning subsystem learns a user s routines of ADLs from the results of sensing subsystem, and gives appropriate prompts to reminding subsystem based on the user s routines and his current step of ADL. In CoReDA, we compose a series of <StepID i 1, StepID i > (i is the index of StepID sequence) as the input of planning subsystem, which is from sensing subsystem. Then we use Q-Learning algorithm [6] to learn the routine of an ADL, which is called policy in the literature on Q-Learning. We start from a random policy. The more Table 2. Sensor and tool of ADL Step ADL ADL Step Sensors & Tools Tooth- Put toothpaste on the brush Acce. on paste tube brushing Brush the teeth Acce. on brush Gargle with water Acce. on cup Drywithatowel Acce. on towel Tea- Put tea-leaf into kettle Acce. on teabox making Pour hot water into kettle Pressure on pot Pour tea into tea cup Acce. on kettle Drink a cup of tea Acce. on teacup the input series is learned, the more precise the policy becomes, in accord with the user s routine of this ADL. When the learning converges 2, the planning subsystem will obtain the user s personalized routine of an ADL. After that, planning subsystem can predict the user s next step of an ADL based on his policy and current step. As output of planning subsystem, prompts are sent to reminding subsystem, which include the tool ID that should be used in the next step and the reminding level (minimal or specific). We use the TD (λ) Q-Learning algorithm in Reinforcement Learning (RL) Toolbox 2.0 [4] to implement our planning subsystem. A RL model consists of three components: a set of states S, a set of actions A, and a set of scalar rewards R : S A R. If the system perceives its state s i and takes action a i, it will transition to future state s i+1 with a probability P (s i,a i,s i+1 ), and receive a reward R(s i,a i,s i+1 ). In our system, a state s i =<StepID i 1, StepID i > is the pair of the current and previous StepID. An action a i =<ToolID i+1,level i+1 > is the prompt that will be sent to the reminding subsystem, which includes the tool ID that should be used in the next step and the reminding level. The learning procedure is depicted in Figure 3. Suppose the system is at the state s i =<StepID i 1, StepID i >, according to the current policy, an action a i =<ToolID i+1, Level i+1 > (a prompt in our system) should be send to the reminding subsystem (1 2 3). The user receives the prompts and changes to Step i+1,so 2 A discussion will be given in section 3.2.

si ai si Policy 1 2 Agent 3 4 si+1 Planning Subsystem ai si+1 <si, ai, si+1> 5 Update policy 6 Learner Reward Function The reminding subsystem receives prompts from planning subsystem, which include the tool ID that should be used in the next step and the reminding level (minimal or specific), and informs users what to do next. The reminding subsystem has three methods to inform users: text message, tool picture and LED blinking. Text message and tool picture are shown on a display. LED blinking is implemented on PAVENET attached to the tool. The green LED indicates the tool should be used. The red LED indicates the tool is incorrectly used. There are two situations which will trigger reminding: 1) the user does not use the tool s/he should use for a certain moment, 2) the user incorrectly uses another tool. In the first case, the picture of the tool that should be used and a text message will be shown on a display in front of him/her, and the green LED on that tool will blink. In the second case, the picture, text message and green LED will still be given, and the red LED on the tool that the user is using will blink. Two reminding levels are provided: minimal gives short message (e.g., use tea-cup ) and less blinks; specific gives long message (e.g., Mr. Kim, use the black tea-box in front of you. ) and more blinks. Reminding Subsystem and the User The user changes his ADL step in accord with the prompt from Reminding Subsystem. Figure 3. Learning procedure the state changes to s i+1 =<StepID i, StepID i+1 > (4). The learner receives this information, and computes a reward r i = R(s i,a i,s i+1 ) from Reward Function(5). At last the policy is updated(6). The system starts from a random policy, and the more it learns, the more precise the policy is. The Reward Function plays an important role in learning procedure, because the goal of Q-Learning is to find a policy which maximizes the cumulative reward R = n i=1 βi r i (r i denotes the reward of step i, and β is a converge factor). By designing a proper Reward Function, we can learn user s routines of ADLs and the minimal prompts s/he need without explicit feedback. We define our Reward Function as follows: For terminal step of an ADL, a large reward 1000 is given to encourage the completion of ADL. For intermediate steps, a bigger reward 100 is given when a minimal reminding is provided, and a smaller reward 50 is given when a specific reminding is provided. This promotes the user to exercise his/her brain instead of depending on the system. 2.3. Reminding subsystem 3. Preliminary Evaluation of CoReDA In order to examine the usability of our system, we implemented two simple scenarios, Tooth-brushing and Teamaking, as mentioned in section 2, and preliminarily evaluated CoReDA on three aspects: 1) extract precision of tool usage, which examines the accuracy of detecting tool usage information from the raw sensor data; 2) learning curve, which shows the properties of TD (λ) Q-Learning algorithm; 3) predict precision of ADL step, which examines whether the results of prediction are practically accurate enough. 3.1. Extract precision of tool usage Using the sensors and tools mentioned in section 2.1, we collected 320 samples of two ADLs, averagely 40 samples for each tool used. One sample is like this: when we pick up tea-box and take tea-leaf from it, whether it can be extracted as the ADL step put tea-leaf into kettle. Table 3 shows the result of our experiment. Table 3. Extract Precision of ADL Step ADL ADL Step Extract Precision Tooth- Put toothpaste on the brush 90% brushing Brush the teeth 100% Gargle with water 100% Drywithatowel 85% Tea- Put tea-leaf into kettle 100% making Pour hot water into kettle 80% Pour tea into tea cup 100% Drink a cup of tea 90% From Table 3, we can find the the precisions of Dry with a towel and Pour hot water into kettle are relatively low.

It is because the duration of these two steps are relatively shorter than other steps. 3.2. Learning curve We collected 120 training samples of each ADL for TD (λ) Q-Learning algorithm. One training sample is a complete process of an ADL. For instance, for the Tea-making ADL, one training sample consists of the continuous four steps: 1) takes tea-leaf from tea-box and puts them into kettle, 2) pours hot water from electronic-pot into kettle, 3) pours tea into tea-cup and 4) drinks a cup of tea. The result of our experiments is shown in Figure 4. It is running on IBM ThinkPad X32 with Pentium(R) M 1.8GHz CPU and 1.5G RAM. Figure 4. Learning curve From Figure 4, we can find that setting the converging condition to 95%, the TD (λ) Q-Learning algorithm for Tooth-brushing will converge after 49 iterations, and 56 iterations for Tea-making. Setting the converging condition to 98%, it is 91 for Tooth-brushing and 98 for Tea-making. Actually, we can set the parameters(converging condition, learning rate, etc.) to make the learning update all the while instead of converging. By doing this, CoReDA can always learn the newest routines of a user, but obviously it is not proper for elderly whose dementia will become worse. 3.3. Predict precision After learning the user s routines, we need to verify the correctness of reminding. There are two situations which will trigger reminding: 1) the user does not use the tool s/he should use for a certain moment, 2) the user incorrectly uses another tool. We collected 30 test samples for each ADL, in which the two situations are equally examined. Table 4 shows the result of our experiments. Table 4. Predict Precision of ADL Step ADL ADL Step Predict Precision Tooth- Put toothpaste on the brush brushing Brush the teeth 100% Gargle with water 100% Drywithatowel 100% Tea- Put tea-leaf into kettle making Pour hot water into kettle 100% Pour tea into tea cup 100% Drink a cup of tea 100% From Table 4, we can find that we do not have results for predicting the first step of each ADL. It is because we need them to trigger the start of prediction. 4. Conclusions In this paper, we presented a prototype called CoReDA (Context-aware Reminding system for Daily Activities) to help elderly with dementia complete their ADLs. By using the wireless sensor node - PAVENET, CoReDA can obtain elderly people s information of tool usage in different ADLs. Based on this information, CoReDA uses TD (λ) Q-Learning technique to provide elderly their personalized and minimal guidance for ADL completion. There are still several challenges we have to deal with. 1) multi-routine plan, for every elderly people, our system can only learn one routine for each of his/her ADLs. However, for some ADLs, such as dressing, one user may have multiple routines to complete it. Therefore, the multiroutine are necessary for even only one user. 2) fast learning, our system spends a relatively long time to learn the routine. However, for a practical system, the elderly may be not so patient to wait for it. Therefore, we need improve our learning algorithm to make it more faster. 3) elderlyfriendly design, since elderly with dementia are quite different from general users, a lot of design must be considered for a practical system. References [1] J. Boger and J. Hoey. A planning system based on markov decision processes to guide people with dementia through activities of daily living. Transactions on Information Technology in Biomedicine, 10(2):323 333, 2006. [2] M. Philipose and K. P. Fishkin. Inferring activities from interactions with objects. Pervasive Computing, 3(1):50 57, Oct.- Dec. 2004.

[3] M. Pollack. Autominder: An intelligent cognitive orthotic system for people with memory impairment. Robot Auton. Syst., 44(3):273 282, 2003. [4] Rl toolbox 2.0. http://www.igi.tugraz.at/ril-toolbox/. [5] S. Saruwatari and T. Kashima. Pavenet: A hardware and software framework for wireless sensor networks. Trans. Society of Instrument and Control Engineers, E(1):74 84, 2005. [6] R. Sutton and A. Barto. Reinforcement Learning. MIT Press, 1998.