Click to show TOC
This is a slightly more elaborate example, in the following pages we will show you how to create a model for estimating the cost of medical treatment. How to display the data in charts using D3. Finally it is to show how powerful machine learning algrothims can be inplemented in the browser.
There is some math in the example and are explained, but you can skip over those if you are not ready to read through it. Hopefully I have kept the explanation simple enough to understand.
This is a simple code example to create a javascript array and display it.
 What just happened, if you execute the code above it will display the array just created.
Note: the last expression array
is displayed.
let
only exists in the snipped
This is a example of optimization useful in certain modeling for estimating the cost of service. The cost we are trying to estimate is the treatment of a medical conditions. This is not using a real medical condition or real data as will be explained later.
expL
take the exponent of the elements of an arrayrect
acts like a rectifier in electronic circuit if val is negetive return 0, otherwise return the value, clamp
id similar, returns 0 for negetive values and 1 for positive caluesmax
returns the max value of a liststdPdiv
get standard deviation uning only the poisitve values of an array range
Create a array with elemenst [0, 1, 2, … n-1]zip
Takes 2 lists (list1, list2) and returns a new array with length of list1, where element i
is a 2 element array [ list1[i], list2[i] ]
This more elaborate example will investigate a data science problem; the modeling of metical treatment for a particular medical condition. For this example we will be creating some synthetic data rauther than using actual medical claims data. The model assumes that all patients are not the same, some have other underlying medical issues that make the treatment more expensive. We will call these risk factors. Examples of risk factors might be age, older than 65, or younger than 5. Other factors might be pregnency, or high blood pressure. These risk factors increase the cost of treating the condition. The problem is that we only have the raw cost of treatment data and knowledge of the underlying condition. What we do not have is how they affect the cost of treatment.
The purpose of the remaining section is model the cost of treatment and discover how these factors change the cost of treatment model.
Cost = Base * factor1 * factor2 * factor3...
, where factor1, factor2 percentage
increase on the cost of treating a patient with thouse aditional factors.Object to model the cost of treatment, this has the following attributes
real
the actual cost of treatmentfactorFlag
this is an array of 1 or 0 for each of the cost factors 1 = factor present for the patient, 0 = factor not presentreal
holds the real cost. Since we do not actually have real data we will create some simulated data,b
in other words factor0
(100% probablity)factor1
has 30% chance, factor2
has 10% chance and so on, this is allocodea using fillFactor()
function. current
best estimate of cost prediction based on the factors estimates
This section we will create some synthetic data for our analysis. For this we will need to have an underlying
model. A common model is to use what we call the multiplicative model, namely the fisk factor increases the cost
of treatment by some percent. So we have a base cost of threatment, let us call this factor0
and the other
factors are factor1
, factor2
…
Every patient will have factor0
and the total cost of treatment. Some patients will have factor1
or not, and similarly
for all the other factors. We choose which factors a patient has by rolling a dice, i.e. each factor has a probablity
associated that it.
Finally we randomly increase or decrese the cost of treatment to reflect real treatment cost data.
 
 
We predict the factors iteratively using gradient desent. But before we do that, let a restate the problem to make the math a bit more elegant and actually simpler.
If we have the following expression (rule of exponentials)
%y = e^a * e^b% can be rewritten as %y = e^{a+b}%
Secondly, any value `x` can be rewritten as an exponential `x = e^{log x}`
Therefore:
%f1 = log("factor1")% and %f2 = log("factor2")% To learn more here is q quick video: Properties of Exponents
The technique is to minimize what is refered to a loss function. The loss function we will use is called the mean squared error (MSE). ">Video in exponents
%MSE = 1/{2n} \sum _{i=1}^{n}(Real_{i} - P_{i})^2% where %n = "number of patients"%
%P_{i}` is the predicted value for `"patient"_{i}`, `P_{i} = e^(F0_{i}+F1_{i}+F2_{i}...)%
So we want to find the values of {f0 ... fn}
to minimise MSE
%{del P_{i}}/{del f0} = P_{i}% is another beautiful property of exponentials, the same is true for %f1, f2 ...%
%F1_{i} = {(0,if text{factorFlag1} = 0),(f1,if text{factorFlag1} = 1):}% for &\ \ "patient"_i%
Similarly for %F2_{i}, F3_{i} …% Note: Since all patients have %f0%, %:. F0_{i} = f0%
The MSE
is also known as the loss function. The purpose of a loss function is to give a way to quantify
how far the prediction is from the desired outcome. So out target is to adjust the values of the factors.
% {del (MSE)}/{del f0} = 1/n \sum _{i=1}^{n}(Real_{i} - P_{i})*P_{i} % is the derivative of `MSE`
Gradient descent is an iterative algrothims for updating the factors %f0, f1, f2 …%
%f0_{"new"} = f0 - {del (MSE)}/{del f0} * epsilon %, where %epsilon% is known as the learning rate and is usuall a small fraction
Similarly for, %f1, f2 …% we do the same as above. %f1_{"new"} = f1 - {del (MSE)}/{del f1} * epsilon %
Checkout this youtube video
  
On the next page we will plot the treatment data to to show the predicted values and the coresponding real cost.
We will use D3 to visualize the factors that have been computed. Where the x-axis represents the predicted cost and the y-axis shows the actual cost. The dashed line is the predicted cost line. Firstly remember this is using come synthetic data with random spred of cost from a prior risk factor values. The entire demo is that give this data the actual risk factors can be computed from the data itself.