Introduction

This is a slightly more elaborate example, in the following pages we will show you how to create a model for estimating the cost of medical treatment. How to display the data in charts using D3. Finally it is to show how powerful machine learning algrothims can be inplemented in the browser.

There is some math in the example and are explained, but you can skip over those if you are not ready to read through it. Hopefully I have kept the explanation simple enough to understand.

Code example

This is a simple code example to create a javascript array and display it.

What just happened, if you execute the code above it will display the array just created.

Note: the last expression array is displayed.

Some simple notes

You can edit the example above and runit as may time as you want
The code changes its theme once you edit it to show you A variable created within a !tryit snippet usin a let only exists in the snipped

Page - 1

Introduction to medical cost modeling

This is a example of optimization useful in certain modeling for estimating the cost of service. The cost we are trying to estimate is the treatment of a medical conditions. This is not using a real medical condition or real data as will be explained later.

Some utility functions

expL take the exponent of the elements of an array
rect acts like a rectifier in electronic circuit if val is negetive return 0, otherwise return the value, clamp id similar, returns 0 for negetive values and 1 for positive calues
max returns the max value of a list
stdPdiv get standard deviation uning only the poisitve values of an array
range Create a array with elemenst [0, 1, 2, … n-1]
zip Takes 2 lists (list1, list2) and returns a new array with length of list1, where element i is a 2 element array [ list1[i], list2[i] ]

function rect(a) { return a<0?0:a;}
function clamp(a) { return a<0?0:1;}

// Array  processing functions
function max(list) {
  if(!list || list.length === 0) return 0;
  return list.reduce((m,v) => m<v?v:m, list[0]);
}
function expL(arr) { return arr.map(v => Math.exp(v)); }

// Get standard deviation of an
function stdDiv(list) {
  if(!list || list.length <= 1) return 0;
  let res = list.reduce((m,v) => ([m[0]+1, m[1]+v*v, m[2]+v]), [0,0,0]);
  let mean = m[2]/m[0];
  return Math.sqrt(m[1]/m[0] - mean*mean);
}

function stdPdiv(list) {
  if(!list || list.length === 0) return 0;
  let res = list.reduce((m,v) => (v<0?m:[m[0]+1, m[1]+v*v]), [0,0]);
  return Math.sqrt(m[1]/m[0]);
}

// create a array with elemenst [0, 1, 2, ... n-1]
function range(n) {
  let res = []
  for(let i=0; i<n; i++) res.push(i);
  return res;
}

// return an array with length of list1, where element ```i``` is a 2 element array [ list1[i], list2[i]  ] 
function zip(list1, list2) {
   return list1.map((v,i) => [v,  list2[i] ])
}

function unzip(list) {
  return [ list.map(([x,y]) => x),  list.map(([x,y]) => y) ];
}

// Add elements of an array
function sum(list,add) { 
    if(!add) add = (a,b) => a+b;
    return list.reduce(add,0); 
}

// function to create a normally distributed random number (see. mean value theorem)
function normalSample(n=12) {
  let v = 0;
  if(n===0) return 0;
  for(let i=0; i<n; i++) v += (Math.random()-0.5);
  return v/n;
}

Page - 2

Modeling Medical treatment

This more elaborate example will investigate a data science problem; the modeling of metical treatment for a particular medical condition. For this example we will be creating some synthetic data rauther than using actual medical claims data. The model assumes that all patients are not the same, some have other underlying medical issues that make the treatment more expensive. We will call these risk factors. Examples of risk factors might be age, older than 65, or younger than 5. Other factors might be pregnency, or high blood pressure. These risk factors increase the cost of treating the condition. The problem is that we only have the raw cost of treatment data and knowledge of the underlying condition. What we do not have is how they affect the cost of treatment.

The purpose of the remaining section is model the cost of treatment and discover how these factors change the cost of treatment model.

Average base cost, the cost incured by most patients
Number of additional complications that increase the cost of treatment. Example of thos those may be age, if you are older or very young. A simple model is to assume that each condition increases the cost by a certain percent.
- Cost = Base * factor1 * factor2 * factor3..., where factor1, factor2 percentage increase on the cost of treating a patient with thouse aditional factors.
- This can also be written in exponential form
- Cost = exp( b + f1 + f2 + f3 ) where b = log(Base) , and f1 = log(factor1) …
The values of b, f1, f2… (also refered to as cost factors) are unknown, but we have data for the cost of service for thousands of patients. We will use the data to estimate the values of the cost factors.
This similar to a regression problem

Object to model the cost of treatment, this has the following attributes

real the actual cost of treatment
factorFlag this is an array of 1 or 0 for each of the cost factors 1 = factor present for the patient, 0 = factor not present
real holds the real cost. Since we do not actually have real data we will create some simulated data,
- All patients have the base factor b in other words factor0 (100% probablity)
- The other factors have a relative probability of occuring, factor1 has 30% chance, factor2 has 10% chance and so on, this is allocodea using fillFactor() function.
current best estimate of cost prediction based on the factors estimates

var TreatmentCost = class { 
  constructor(factorEstimates, factorIndicators, realFactor) {

if(factorEstimates.factorFlag) {
        this.factorFlag = factorEstimates.factorFlag;
        this.logReal = factorEstimates.logReal; 
    } 
    else {
      // aray of flags 0 = patient has the risk factor, 0 = the patient does not have the risk factor
      this.factorFlag = Array.isArray(factorIndicators)? factorIndicators.slice() : factorIndicators();
      
      // create cost of treatment +/- 75% random fluctuations in the cost of treatment
      this.logReal = this.getRisk(realFactor)+Math.log(1 + normalSample()*1.5);
    }
    this.real = Math.exp(this.logReal);
    this.logCurrent = 0;
    this.current = 1;
  }

get code() {
    // convert the factor flag bits to an integer
    // since each factor flag is 1 or 0, this can be used
    // to create an binary integer (each factor is a bit in the integer)
    // code = flag0*1 + flag1*2 + flag3*4 ...
    return this.factorFlag.reduceRight((s,v) => s*2+v, 0);
  }

getRisk(factors) {
    let sumOfFactors = (total, v, i) => total+(v?factors[i]:0);
    return this.factorFlag.reduce(sumOfFactors, 0);
  }

getCostIndex(factors) {
      return sum(this.factorFlag, ((s, v, i) => v?s+factor[i]:s));
  }

hasRiskFactor(i) {
    return !!this.factorFlag[i];
  }

setCurrent(factorEstimates) {
    this.logCurrent = this.getRisk(factorEstimates);
    this.current = Math.exp(this.logCurrent);
  }

_setCurrent(cv) {
    this.current = cv;
    this.currentLog = Math.log(cv);
  }

// derivitive of the factor with respect to factor[i], this is used to perform a gradient descent
  // search to the optimal value of the factors so that RMS difference of actual cost to computed cost based
  // on the factors is minimized.
  dx(i) {
    if(!this.hasRiskFactor(i)) return 0;
    return -(this.current-this.real)*this.current;
    //return -(this.logCurrent-this.logReal);
  }
}

Page - 3

Create synthetic treatment data

This section we will create some synthetic data for our analysis. For this we will need to have an underlying model. A common model is to use what we call the multiplicative model, namely the fisk factor increases the cost of treatment by some percent. So we have a base cost of threatment, let us call this factor0 and the other factors are factor1, factor2 …

Every patient will have factor0 and the total cost of treatment. Some patients will have factor1 or not, and similarly for all the other factors. We choose which factors a patient has by rolling a dice, i.e. each factor has a probablity associated that it.

Finally we randomly increase or decrese the cost of treatment to reflect real treatment cost data.

Setup some constants

EPI_COUNT - number of test treatment data to create (we call each of those an episode)
ITER - number of itteratins to compute the factors
EPSILON - increment size for the gradient descent

Some space to inspect data

Display Histogram

Page - 4

Predict factors (using gradient descent)

We predict the factors iteratively using gradient desent. But before we do that, let a restate the problem to make the math a bit more elegant and actually simpler.

Exponential function

If we have the following expression (rule of exponentials)

%y = e^a * e^b% can be rewritten as %y = e^{a+b}%

Secondly, any value `x` can be rewritten as an exponential `x = e^{log x}`

Therefore:

%f1 = log("factor1")% and %f2 = log("factor2")% To learn more here is q quick video: Properties of Exponents

The technique is to minimize what is refered to a loss function. The loss function we will use is called the mean squared error (MSE). ">Video in exponents

%MSE = 1/{2n} \sum _{i=1}^{n}(Real_{i} - P_{i})^2% where %n = "number of patients"%

%P_{i}` is the predicted value for `"patient"_{i}`, `P_{i} = e^(F0_{i}+F1_{i}+F2_{i}...)%

So we want to find the values of {f0 ... fn} to minimise MSE

%{del P_{i}}/{del f0} = P_{i}% is another beautiful property of exponentials, the same is true for %f1, f2 ...%

%F1_{i} = {(0,if text{factorFlag1} = 0),(f1,if text{factorFlag1} = 1):}% for &\ \ "patient"_i%

Similarly for %F2_{i}, F3_{i} …% Note: Since all patients have %f0%, %:. F0_{i} = f0%

The MSE is also known as the loss function. The purpose of a loss function is to give a way to quantify how far the prediction is from the desired outcome. So out target is to adjust the values of the factors.

% {del (MSE)}/{del f0} = 1/n \sum _{i=1}^{n}(Real_{i} - P_{i})*P_{i} % is the derivative of `MSE`

Gradient descent is an iterative algrothims for updating the factors %f0, f1, f2 …%

%f0_{"new"} = f0 - {del (MSE)}/{del f0} * epsilon %, where %epsilon% is known as the learning rate and is usuall a small fraction
Similarly for, %f1, f2 …% we do the same as above. %f1_{"new"} = f1 - {del (MSE)}/{del f1} * epsilon %

Checkout this youtube video

Gradient Descent

//$$.D(dx(3)/total, total );
let tcomp = factorEstimates;
let scaleFunc = () =>  total;

// perform gradient descent on factors
for(let i=0; i< ITER; i++) {
  tcomp = computeUpdatedFactors(episodes, tcomp, EPSILON, scaleFunc);
}
let costToMember = sum(episodes.map(t => rect(t.real-t.current)));
let costSaved = sum(episodes.map(t => rect(t.current-t.real)));
let outOfPocketMembers = sum(episodes.map(t => clamp(t.real-t.current)));
// $$.D(sum(episodes.map(t => t.real-t.current)), costToMember/outOfPocketMembers, costToMember, total, costSaved, 
//      outOfPocketMembers, max(episodes.map(t => t.real-t.current)));
// [tcomp.map((v,i) => (Math.exp(v)-Math.exp(actualFactors[i]))/Math.exp(actualFactors[i])*100+'%'), expL(tcomp), expL(actualFactors)]
$$.D( {
     "Real Cost - Projected Cost": sum(episodes.map(t => t.real-t.current)), 
     "Average Patient Out of Pocket": costToMember/outOfPocketMembers, 
     "Total cost to all Patients": costToMember, 
     "Total cost of treatment (All patients)": total, 
     "Savings for insurance company": costSaved, 
     "Total out of pocket": outOfPocketMembers, 
     "Highest out odf pocket": max(episodes.map(t => t.real-t.current))
    });
({ 
   "Computed Values": expL(tcomp), 
   "Actual Values": expL(actualFactors),
   "Difference Between Computed and Actual" : tcomp.map((v,i) => (Math.exp(v)-Math.exp(actualFactors[i]))/Math.exp(actualFactors[i])*100+'%'), 
})

On the next page we will plot the treatment data to to show the predicted values and the coresponding real cost.

Page - 5

Visualize Factor Based Optimization Results

We will use D3 to visualize the factors that have been computed. Where the x-axis represents the predicted cost and the y-axis shows the actual cost. The dashed line is the predicted cost line. Firstly remember this is using come synthetic data with random spred of cost from a prior risk factor values. The entire demo is that give this data the actual risk factors can be computed from the data itself.

function scatter(episodes, plotArea) {

if(arguments.length <= 1) plotArea = genID($$.executeDiv);
$$.lastly( () => {
    var id = document.getElementById(plotArea);
    id.innerHTML = '';
    var margin = {top: 10, right: 30, bottom: 30, left: 60},
       width = 1024 - margin.left - margin.right,
       height = 512 - margin.top - margin.bottom;
       // append the svg object to the body of the page
       var svg = d3.select(`#${plotArea}`)
         .append("svg")
           .attr("width", width + margin.left + margin.right)
           .attr("height", height + margin.top + margin.bottom)
         .append("g")
           .attr("transform",
                 "translate(" + margin.left + "," + margin.top + ")");

//Read the data
    (function(epi) {
     // 
     // Add X axis
     var x = d3.scaleLog()
       .domain([1, 1])
       .range([ 0, 0 ]);
     svg.append("g")
       .attr("class", "myXaxis")   // Note that here we give a class to the X axis, to be able to call it later and modify it
       .attr("transform", "translate(0," + height + ")")
       .call(d3.axisBottom(x))
       .attr("opacity", "0")
     var [yLow, yHigh] = d3.extent(epi.map(d => d.real));
     // Add Y axis
     var y = d3.scaleLog()
       //.domain([1, 500000])
       .domain([yLow*0.8, yHigh*1.1])
       .range([ height, 0]);
     svg.append("g")
       .call(d3.axisLeft(y));

// Add Axis Labels
      svg.append("text")
          .attr("text-anchor", "end")
          .attr("x", width)
          .attr("y", height + margin.top + 20)
          .text("Predicted Cost in $");

// Y axis label:
      svg.append("text")
          .attr("text-anchor", "end")
          .attr("transform", "rotate(-90)")
          .attr("y", -margin.left+20)
          .attr("x", -margin.top)
          .text("Real Cost in $")

// Add dots
     svg.append('g')
       .selectAll("dot")
       //.data(data)
       .data(epi)
       .enter()
       .append("circle")
         // .attr("cx", function (d) { return x(d.GrLivArea); } )
         // .attr("cy", function (d) { return y(d.SalePrice); } )
         .attr("cx", function (d) { return x(d.current); } )
         .attr("cy", function (d) { return y(d.real); } )
         .attr("r", 5)
         .style("fill", "#69b3a2")

var [xLow, xHigh] = d3.extent(epi.map(d => d.current));
           var xStart = xLow*0.9;
          // new X axis
          //x.domain([0, 4000])
          x.domain([xStart, xHigh *= 1.1])
           .range([ 0, width ]);

svg
            .append("line")           // attach prediction line
            .style("stroke", "blue")  // colour the line
            .attr("x1", x(xStart))     // x position of the first end of the line
            .attr("y1", y(xStart))      // y position of the first end of the line
            .attr("x2", x(xHigh))     // x position of the second end of the line
            .attr("y2", y(xHigh))    // y position of the second end of the line
            .attr("opacity", "0.25")
            .attr("stroke-dasharray","5, 5");
      
     svg.select(".myXaxis")
       .transition()
       .duration(2000)
       .attr("opacity", "0.75")
       .call(d3.axisBottom(x));

svg.selectAll("circle")
       .transition()
       .delay(function(d,i){return(i*1)})
       .duration(1000)
       .attr("cx", function (d) { return x(d.current); } )
       .attr("cy", function (d) { return y(d.real); } )
       .attr("opacity", "0.1")
    })(episodes);
  });

$$.D($$.HTML(`<div id="${plotArea}" />`));
  return plotArea;
}
scatter(episodes)

A much faster Adjustment algorithm

//$$.D(episodes)
    function _reduce(i, max, fn,initial) {
      let b = 1
      let mask = 1 << i;
      for(j=0; j<max; j++ ) {
        if(j&mask) initial = fn(initial,j)
      }
      return initial;
    }

let s = [];
       _reduce(1, 64,(s, i)=> (s.push(i),s),s);
       $$.D(s.length);
       
       let bitsNeeded = actualFactors.length;
       let codeMax = 1 << bitsNeeded;
       let counts = episodes.reduce((res,e) => (res[e.code]++, res), range(codeMax).fill(0));
       let sums = episodes.reduce((res,e) => (res[e.code]+=e.real, res), range(codeMax).fill(0));
       let codeFactors = range(1 << bitsNeeded).fill(1);
       let computedFactors = actualFactors.slice().fill(0);
       let total = sum(sums);
      $$.D(counts.slice(20,64),sums.slice(20,64))

function iterate(codeFactors, computedFactors, epsilon) {

for(let i=0; i< bitsNeeded; i++){
          let dx = adjustFactor(i)*epsilon;
          let expDx = Math.exp(dx);
          _reduce(i,codeMax, (s,ix)=>(s[ix] *= expDx, s), codeFactors);
          computedFactors[i] += dx;
        }
        function adjustFactor(i) {
           let _dx = _reduce(i,codeMax, 
                             (delta, ix) => delta+(sums[ix]-counts[ix]*codeFactors[ix])*codeFactors[ix],
                     0 );
           return (_dx/total);
        }
      }

for(let i=0; i< ITER; i++) {
      iterate(codeFactors, computedFactors, EPSILON);
    }
    episodes.forEach(e => e._setCurrent(codeFactors[e.code]));

Render results using D3

Page - 6