It seems I haven’t used my blog for a while.

I created a small colab on a simple RL agent.

Over at github I have put the following:

This introduces a few known and a few new forecast functions. It then builds an **ensemble forecast** out of 13 models. It has the following steps:

- Learn all models over training period
- Predict h periods ahead and build a weighted Bayesian model of the forecasts
- Retrain the model on training + h to give new forecasts beyond this period (using previous weights)

It introduces four Bayesian models in stan

- ARMA(2, 1)
- ARMA(2, 1) with weighting of obs
- Local linear trend
- Weight model (eg it can model 13 weights on 13 X variables and 10 time steps which is not possible in frequentist setup)

Note that most code has tests around the functions. You need to load all scripts to get the `forecastEns()`

to run.

Please see this blog post.

There has been a lot of criticism that many modern algorithms enforce inequality and are racist. This example shows how a predictive model can be made less discriminatory.

Nowadays supermarket customers can check out themselves and scan products themselves. Many obviously pay by card but I was interested to see how a cash dispensing till would work.

I have written some Python code to simulate the whole setup. We start with a till that has a distibution of coins (1000 1p etc). The key function is *change()*, which given a price and total paid (difference *rem*) has to issue the right coin change. It finds the largest coin below the rem and then works from largest to smallest to issue correct change. This way we try to avoid running out of small coins. This large-to-small algorithm is very simple and probably close how humans would do it. One improvement could be to under-sample coins that we have few off in the till.

The trickiest bit (*draw()*) is simulating what a customer would give in change. I had to add some randomness to it to stock up small coins (10% of time customer gives exact change in 1, 2 or 5p coins). Without this part the till will run out of change very quickly.

I work a lot with global variables in this example as I don’t have to handle function IO so much.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# auto till: users pay for products but don't have exact change # what is the best algo to serve many customers and not run out of coins import random, sys customers = 10000 till = {1: 1000, 2: 1000, 5: 1000, 10: 1000, 20: 100, 50: 100, 100: 100} # in pence failure = 0 def total(till): return sum([k * v for k, v in till.iteritems()]) print(total(till)) def change(): # give change, start from biggest coin and pay out lower = sorted([k for k in till.keys() if k <= pay - price], reverse = True) rem = pay - price idx = 0 ch = {} while rem > 0 and idx < len(lower): cur = lower[idx] div = rem / cur # int div if till[cur] >= div and div > 0: rem -= div * cur ch[cur] = div idx += 1 return ch pay = 20 price = 17 print(change()) # test def rem(): # remove ch from till for k, v in ch.iteritems(): till[k] -= v if till[k] < 0: failure = 1 return None def add(): # add the new coins to till for k, v in coins.iteritems(): till[k] += v return None def draw(): # given price pay more than asked for # eg if price=357 return 100: 4 if random.random() < .1: # by chance give all 1 or 2p or 5p coins if price % 5 == 0: d = {5: price / 5} if price % 2 == 0: d = {2: price / 2} else: d = {1: price} else: lower = max([k for k in till.keys() if k <= price]) div = price / lower d = {lower: div + 1} return d for c in range(customers): price = int(1 + 200 * random.random()) coins = draw() pay = total(coins) if pay < price: failure = 2 ch = change() if total(ch) <> pay - price: failure = 3 rem() # give change before add new coins add() # add new coins if failure > 0: print(c, failure, price, pay, ch, till) sys.exit(1) print(till, total(till)) |

In this example I want to show the principles of dynamic programming and recursion in a simple example. Imagine you have coins of different values and you want to count the many ways that those coins can make up 200 pennies/cents. To solve this problem we use a recursive function which grows like a tree. You start with the biggest coins first.

Define a function add() which first loops through all coins. If we have an empty current set or the current coin is less or equal the last coin in the current set, then continue: if the sum of the set plus the proposed coin is less than 200, then add the current coin to the set and pass the new set to add(); if the sum of the set plus the current coin equals 200, add the coin to the set and add the new set to the set of solutions (do nothing if 200 is exceeded). Finally print the length of all solutions which is 73682.

This a recursive setup where add() references itself – you pass a current unfinished set to this function.

Here is the code in Python (v3) – it takes 7 seconds on repl.it.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
#as a tree tot = 200 coins = [200, 100, 50, 20, 10, 5, 2, 1] finals = [] #global def add(cur): for c in coins: if cur == [] or c <= cur[-1]: #add same or smaller if sum(cur) + c < tot: cur2 = cur + [c] add(cur2) elif sum(cur) + c == tot: cur2 = cur + [c] finals.append(cur2) add([]) # start with empty set print(len(finals)) |

I was interested to see how to match 2 different sets with slightly different distributions. This can be relevant when you want to test the differences between 2 groups.

Assume you have a long (1) and short set (2). First I sort set 2 by its values. I also estimate the mean difference between consecutive sorted values (mean diff).

My algorithm passes once through set 1 and tries to find a match for every set 1 item in set 2. If the current difference/distance is better than the previous or the next then we have a match (because it’s sorted), and we remove the matched item from set 2. I added a condition where the difference has to be between X multiples of the mean difference. This ensures that I don’t match some remaining large value/distance just because few match items remain. I also added another break condition: if distances get bigger, stop.

I matched 93% in my test. It takes 6% of n1*n2 possible iterations in my test with n1=60k and n2=10k.

The resulting distributions of the matches is an average between distribution 1 and 2. This means you cannot longer assume that the matched items represent the full sets 1 or 2. However you can compare matches to each other.

The code can be seen and run here https://repl.it/BU9Z

I have played around with finding number sequences of the form `a+b*c^d`

, where a/b/c/d can be the iterator x. The input is the first five numbers of the sequence. The JS code can be found here JSfiddle. It takes c 2.5 seconds to find all equations. This obviously doesn’t cover all interesting sequences, but might be a nice example how to use the `eval()`

function for this.

I have been writing some Javascript which reads data, creates the <table> HTML tags and then does conditional formatting of given numbers (based on five equal sized bins). Let me know if you find any bugs.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
//read data into table and conditonal format on simplified quintiles var condFormat=[0,1,1]; //1 for each column to index var data=[ ["x","y","z"], ["a",0.338394552,0.676789103], ["b",0.24817013,0.49634026], ["a",0.556582228,1.113164457], ["b",0.995206746,1.990413492], ["a",0.647278038,1.294556075], ["b",0.856245509,1.712491017], ["a",0.997568673,1.995137345], ["b",0.184854546,0.369709092], ["a",0.321731649,0.643463298], ["b",0.616996239,1.233992477], ["a",0.920923371,1.841846742], ["b",0.514028674,1.028057347], ["a",0.711318783,1.422637566], ["b",0.149519961,0.299039921] ]; var colours=['red','orange','#FF7E00','yellow','green']; //http://stackoverflow.com/questions/2044760/default-array-values Array.prototype.repeat= function(what, L){ while(L) this[--L]= what; return this; } function quint(x) { var cols=x[0].length; var max=[].repeat(-9999999999999,cols); var min=[].repeat(9999999999999,cols); var q=[]; for (var j=0;j<cols;j++) { for (var i=0;i<x.length;i++) { max[j]=Math.max(max[j],x[i][j]); min[j]=Math.min(min[j],x[i][j]); } diff=max[j]-min[j]; q.push([min[j]+diff/5,min[j]+2*diff/5,min[j]+3*diff/5,min[j]+4*diff/5]); } return q; } function popTable() { //find quintiles var data2=[]; for (var i=1;i<data.length;i++) { row=[]; for (var j=0;j<condFormat.length;j++) { if (condFormat[j]==1) { row.push(data[i][j]); } } data2.push(row); } q=quint(data2); console.log(q); var s=''; for (var i=0;i<data.length;i++) { s+='<tr>'; var j2=-1; for (var j=0;j<condFormat.length;j++) { if (condFormat[j]==1 && i>0) { j2+=1; c=colours[4]; for (var k=0;k<4;k++) { console.log(i,j,j2,k); if (data[i][j]<q[j2][k]) {c=colours[k]; break;} } s+="<td style='background-color:"+c+"'>"+data[i][j]+"</td>"; } else { s+="<td>"+data[i][j]+"</td>"; } } s+='</tr>'; } document.getElementById('tab').innerHTML=s; } |