Click to show TOC

Introduction

This is a slightly more elaborate example, in the following pages we will show you how to create a model for estimating the cost of medical treatment. How to display the data in charts using D3. Finally it is to show how powerful machine learning algrothims can be inplemented in the browser.

There is some math in the example and are explained, but you can skip over those if you are not ready to read through it. Hopefully I have kept the explanation simple enough to understand.

Code example

This is a simple code example to create a javascript array and display it.

 
 
     

What just happened, if you execute the code above it will display the array just created.

Note: the last expression array is displayed.

Some simple notes

  • You can edit the example above and runit as may time as you want
  • The code changes its theme once you edit it to show you A variable created within a !tryit snippet usin a let only exists in the snipped




Page - 1




Introduction to medical cost modeling

This is a example of optimization useful in certain modeling for estimating the cost of service. The cost we are trying to estimate is the treatment of a medical conditions. This is not using a real medical condition or real data as will be explained later.

Some utility functions

  1. expL take the exponent of the elements of an array
  2. rect acts like a rectifier in electronic circuit if val is negetive return 0, otherwise return the value, clamp id similar, returns 0 for negetive values and 1 for positive calues
  3. max returns the max value of a list
  4. stdPdiv get standard deviation uning only the poisitve values of an array
  5. range Create a array with elemenst [0, 1, 2, … n-1]
  6. zip Takes 2 lists (list1, list2) and returns a new array with length of list1, where element i is a 2 element array [ list1[i], list2[i] ]
 
 
     




Page - 2




Modeling Medical treatment

This more elaborate example will investigate a data science problem; the modeling of metical treatment for a particular medical condition. For this example we will be creating some synthetic data rauther than using actual medical claims data. The model assumes that all patients are not the same, some have other underlying medical issues that make the treatment more expensive. We will call these risk factors. Examples of risk factors might be age, older than 65, or younger than 5. Other factors might be pregnency, or high blood pressure. These risk factors increase the cost of treating the condition. The problem is that we only have the raw cost of treatment data and knowledge of the underlying condition. What we do not have is how they affect the cost of treatment.

The purpose of the remaining section is model the cost of treatment and discover how these factors change the cost of treatment model.

  1. Average base cost, the cost incured by most patients
  2. Number of additional complications that increase the cost of treatment. Example of thos those may be age, if you are older or very young. A simple model is to assume that each condition increases the cost by a certain percent.
    • Cost = Base * factor1 * factor2 * factor3..., where factor1, factor2 percentage increase on the cost of treating a patient with thouse aditional factors.
    • This can also be written in exponential form
    • Cost = exp( b + f1 + f2 + f3 ) where b = log(Base) , and f1 = log(factor1)
  3. The values of b, f1, f2… (also refered to as cost factors) are unknown, but we have data for the cost of service for thousands of patients. We will use the data to estimate the values of the cost factors.
  4. This similar to a regression problem

Object to model the cost of treatment, this has the following attributes

  1. real the actual cost of treatment
  2. factorFlag this is an array of 1 or 0 for each of the cost factors 1 = factor present for the patient, 0 = factor not present
  3. real holds the real cost. Since we do not actually have real data we will create some simulated data,
    • All patients have the base factor b in other words factor0 (100% probablity)
    • The other factors have a relative probability of occuring, factor1 has 30% chance, factor2 has 10% chance and so on, this is allocodea using fillFactor() function.
  4. current best estimate of cost prediction based on the factors estimates
 
 
     




Page - 3




Create synthetic treatment data

This section we will create some synthetic data for our analysis. For this we will need to have an underlying model. A common model is to use what we call the multiplicative model, namely the fisk factor increases the cost of treatment by some percent. So we have a base cost of threatment, let us call this factor0 and the other factors are factor1, factor2

Every patient will have factor0 and the total cost of treatment. Some patients will have factor1 or not, and similarly for all the other factors. We choose which factors a patient has by rolling a dice, i.e. each factor has a probablity associated that it.

Finally we randomly increase or decrese the cost of treatment to reflect real treatment cost data.

Setup some constants

  1. EPI_COUNT - number of test treatment data to create (we call each of those an episode)
  2. ITER - number of itteratins to compute the factors
  3. EPSILON - increment size for the gradient descent
 
 
     
 
 
     

Some space to inspect data

 
 
     

Display Histogram

 
 
     




Page - 4




Predict factors (using gradient descent)

We predict the factors iteratively using gradient desent. But before we do that, let a restate the problem to make the math a bit more elegant and actually simpler.

Exponential function

If we have the following expression (rule of exponentials)

%y = e^a * e^b% can be rewritten as %y = e^{a+b}%

Secondly, any value `x` can be rewritten as an exponential `x = e^{log x}`


Therefore:

%f1 = log("factor1")% and %f2 = log("factor2")% To learn more here is q quick video: Properties of Exponents

The technique is to minimize what is refered to a loss function. The loss function we will use is called the mean squared error (MSE). ">Video in exponents

%MSE = 1/{2n} \sum _{i=1}^{n}(Real_{i} - P_{i})^2% where %n = "number of patients"%

%P_{i}` is the predicted value for `"patient"_{i}`, `P_{i} = e^(F0_{i}+F1_{i}+F2_{i}...)%

So we want to find the values of {f0 ... fn} to minimise MSE

%{del P_{i}}/{del f0} = P_{i}% is another beautiful property of exponentials, the same is true for %f1, f2 ...%

%F1_{i} = {(0,if text{factorFlag1} = 0),(f1,if text{factorFlag1} = 1):}%     for &\ \ "patient"_i%

Similarly for %F2_{i}, F3_{i} …% Note: Since all patients have %f0%,     %:. F0_{i} = f0%

The MSE is also known as the loss function. The purpose of a loss function is to give a way to quantify how far the prediction is from the desired outcome. So out target is to adjust the values of the factors.

% {del (MSE)}/{del f0} = 1/n \sum _{i=1}^{n}(Real_{i} - P_{i})*P_{i} %     is the derivative of `MSE`

Gradient descent is an iterative algrothims for updating the factors %f0, f1, f2 …%

%f0_{"new"} = f0 - {del (MSE)}/{del f0} * epsilon %,     where %epsilon% is known as the learning rate and is usuall a small fraction
Similarly for, %f1, f2 …% we do the same as above. %f1_{"new"} = f1 - {del (MSE)}/{del f1} * epsilon %

Checkout this youtube video

 
 
     

Gradient Descent

 
 
     

On the next page we will plot the treatment data to to show the predicted values and the coresponding real cost.




Page - 5




Visualize Factor Based Optimization Results

We will use D3 to visualize the factors that have been computed. Where the x-axis represents the predicted cost and the y-axis shows the actual cost. The dashed line is the predicted cost line. Firstly remember this is using come synthetic data with random spred of cost from a prior risk factor values. The entire demo is that give this data the actual risk factors can be computed from the data itself.

 
 
     

A much faster Adjustment algorithm

 
 
     

Render results using D3

 
 
     



Page - 6