Published: May 18, 2018
Most of this was written a while back, I wouldn't code some of the simulations the same way now, but everything still works so it's being left as is.
The packages needed for examples below.
library(gridExtra)
library(tidyverse)
theme_set(theme_minimal())
Handicappers that sell their picks to the general public don't win and are a bad bet for gamblers.
note: the only long term documented exception to this rule I know of is RAS but as they say "one point does not a trend make."
It's unfortunate that this even needs to be said. All of the common sense arguments against touts seem totally logical to me. Why if someone could win often enough to be a profitable bettor would he sell picks? If you find a pot of gold at the end of the rainbow do you call everyone over to share in your new fortunes? Maybe you do-- but professional gamblers' wouldn't. For the same reason the you can't find a book that will teach you how to become a winning bettor you won't find real professionals selling their winning picks. Once the information becomes public it loses its value.
But I'm not here to try and convince you with words why touts are long term losers, we will run some simulations to see how common it is for a coin-flipper to win and exactly how long until the law of large numbers catches up with him.
For the experiment we will create a number of bettors and watch their performance over a set number of bets.
To start off we will make some assumptions:
The sample
function in R makes it easy to run the simulations. First
we will define the parameters, i.e. the number of bettors, number of
bets, win/loss amount.
Next, the bet simulation is accomplished by repeatedly sampling from the
vector c(win, loss)
. The likelihood of either being selected is the
same by default. All of the simulations are put inside a matrix and
arranged by bettor.
If there's 100 bettors and they place 100 bets that's 10,000 simulated
bets. Since the number of samples to be taken is bigger than the sample
pool (win/lose) the setting replace =
must be set to TRUE
. All of
the sims are put into a matrix and the the shape of the matrix is the
number of bets by the number of bettors. We stick that matrix inside a
dataframe.
bets <- 100
bettors <- 100
win <- 100
loss <- -110
df <-
as_data_frame(
matrix(
sample(c(win, loss),
bettors * bets,
replace = TRUE),
nrow = bets))
To keep a running total of the bets across time we will use the
mutate_all()
function. Then we will add a bet_number
column by using
the 1:n()
function. Finally, use the gather()
function to manipulate
the data into a tidy
format
df <- df %>%
mutate_all(funs(cumsum)) %>%
mutate(bet_number = 1:n()) %>%
gather(bettor, balance, -bet_number)
Set the bet number as the x-axis and the balance as the y-axis. This layout tracks how the each bettor performs over the number of bets.
ggplot(df, aes(x = bet_number, y = balance)) +
geom_line(alpha = .55) +
labs(x = "Bet Number", y = "Balance ($)") +
scale_y_continuous(labels = scales::dollar)
The x-axis is the bet number. For an example "25" is the 25th bet in the sequence. We're interested in the far right of the graph that represents the amount of money the bettor ends up with after the sequence of bets. While each line represents a unique bettor our focus is the overall distribution (the average is the slightly darker line in the middle of the wave like shape).
After 100 bets quite a few bettors end their sequence up money. Remember that these bettors are coin-flippers and are just as likely to win as lose. However, since the bettors lose more money than they win (-110 vs +100) it's more likely that a bettor will end up negative. We can figure out how often we need to win to breakeven with some simple math.
110/(100 + 110) = .5238
In order to just breakeven after a sequnce of n bets we need to win 52.4% of our bets betting into -110 lines. The line between winning and losing is razor thin. The question then becomes how long until your luck runs out. Since we're going to changing parameters like number of bets, win and loss percentage and number of bettors we're going to turn our code into a reusable function.
bettor_sim <-
function(bets = 100, bettors = 100,
win = 100, loss = -110, winPct = .5) {
df <- as_data_frame(
matrix(
sample(c(win, loss),
bettors * bets,
replace = TRUE,
prob = c(winPct, (1 - winPct))
),
nrow = bets))
df %>%
mutate_all(funs(cumsum)) %>%
mutate(bet_number = 1:n()) %>%
gather(bettor, balance, -bet_number) %>%
ggplot(aes(x = bet_number, y = balance)) +
geom_line(alpha = .35) +
scale_y_continuous(labels = scales::dollar) +
labs(x = "Bet Number", y = "Balance ($)")
}
bettor_sim(bets = 5000)
The last row of the dataframe is the ending balance so it is possible to look at a distribution of the bettor results. We will have to alter the function a little to return the actual data instead of a plot.
bettor_sim_data <-
function(bets = 100, bettors = 100,
win = 100, loss = -110, winPct = .5) {
df <- as_data_frame(
matrix(
sample(c(win, loss),
bettors * bets,
replace = TRUE,
prob = c(winPct, (1 - winPct))
),
nrow = bets)) %>%
mutate_all(funs(cumsum)) %>%
mutate(bet_number = 1:n()) %>%
gather(bettor, balance, -bet_number)
df
}
bettor_sim_data(bets = 100, bettors = 500) %>%
filter(bet_number == 100) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "bisque", color = "white",binwidth = 500) +
scale_x_continuous(labels = scales::dollar)
The mean is slightly below zero after 100 bets meaning that most bettors broke even in their bets but the cost of juice brought them into the red. There's still quite a few bettors in the positive, some as high as $2,000. At 100 a bet 2k is 20 units. Quite a score. When the number of bets increases the number of bettors to the right of zero starts to decrease.
First let's increase the number of bets from 100 to 500, then to 2500, and finally 5000. (while 5000 bets sounds like a lot, if you placed 3 bets a day on average for 350 of the 365 days in a year you would hit five thousand in less than five years. In reality you will hit this number much sooner).
# Plot 1: 500 bettors 100 bets
p1 <- bettor_sim_data(bets = 100, bettors = 500) %>%
filter(bet_number == 100) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "bisque", color = "white",binwidth = 500) +
geom_vline(aes(xintercept=0), col ="black") +
scale_x_continuous(labels = scales::dollar) +
labs(title = "Bets: 100")
# Plot 2: 500 bettors 500 bets - remember to change bet_number in filter
p2 <- bettor_sim_data(bets = 500, bettors = 500) %>%
filter(bet_number == 500) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "chartreuse3", color = "white",binwidth = 2000) +
geom_vline(aes(xintercept=0), col ="black") +
scale_x_continuous(labels = scales::dollar) +
labs(title = "Bets: 500")
# Plot 3: 500 bettors 2500 bets - remember to change bet_number in filter
p3 <- bettor_sim_data(bets = 2500, bettors = 500) %>%
filter(bet_number == 2500) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "cyan2", color = "white",binwidth = 2500) +
geom_vline(aes(xintercept=0), col ="black") +
scale_x_continuous(labels = scales::dollar) +
labs(title = "Bets: 2500")
# Plot 4: 500 bettors 5000 bets - remember to change bet_number in filter
p4 <- bettor_sim_data(bets = 5000, bettors = 500) %>%
filter(bet_number == 5000) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "darkslateblue", color = "white",binwidth = 5000) +
geom_vline(aes(xintercept=0), col ="black") +
scale_x_continuous(labels = scales::dollar) +
labs(title = "Bets: 5000")
grid.arrange(p1, p2, p3, p4, nrow = 2, ncol = 2)
The black vertical line in each plot marks the $0 point. Everything to the left of the line is a negative result. As the number of bets rise the number of players that end in the negative increases. By 2,500 bets almost all of the bettors are negative. After 5,000 bets all of the bettors are far below zero costing them thousands of dollars. These results assume that each bettor wins 50% of his bets.
Before we get ahead of ourselves looking at a 70% win rate let's be a little more realistic and see what a 55% winner looks like. Rather than rerunning the same four functions from above and combining them again (with new win percentage parameters) we will turn the code above into a resuable function so we can easily rerun our simulations with new parameters.
bet_hist_dist <- function(bets, bettors, winPct, color,bw) {
bettor_sim_data(bets = bets,
bettors = bettors,
winPct = winPct) %>%
filter(bet_number == bets) %>%
ggplot(aes(balance)) +
geom_histogram(fill = color,
color = "white",
binwidth = bw) +
geom_vline(aes(xintercept = 0), color = "black") +
scale_x_continuous(labels = scales::dollar) +
labs(title = paste("Bets", bets))
}
bet_hist_dist(100, 500, .50, "azure3", 500)
The function will plot histogram similar to the histograms above.
There's a few extra parameters now; color
and bw
(which is short for
binwidth). These two parameters make it easier when we combine multiple
historgrams. Color is obvious but we turn bindwidth into a parameter
because increasing the number of bets also increases the amount won and
lost.vWe could just use a formula like bets * 3
for binwidth but
setting the parameter lets us control the shape of the histrogram to fit
in with the other histograms on the page.
Now that we have a working function that plots the data in a format it's time to rerun the function to explore how a 55% fares over different number of bets.
Instead of running the function four seperate times each with different
parameters we can use the pmap
function from the purrr
package. Pass
a list of lists containing the various parameters and the function to
pmap
. To keep things clean assign the list of lists to a object named
params. Next assign the results of pmap
to po
which is now a list
containing the four different histograms (the reason the number of
histograms is four is because our params list has a list containing four
items, pmap
will iterate over each setting).
Select each list item using the [[ ]]
notation and put all four of
them in the grid.arrange
function.
# The parameters that go to the bet_hist_dist function above.
params <- list(
bets = list(100, 500, 2500, 5000),
bettors = list(500),
winPct = list(.55),
color = list("bisque", "chartreuse3", "cyan2", "darkslateblue"),
bw = list(500, 1000, 2500, 5000)
)
po <- pmap(params, bet_hist_dist)
grid.arrange(po[[1]], po[[2]], po[[3]], po[[4]], nrow = 2, ncol = 2)
Considering that it's nearly impossible to go broke hitting 55% after 2500 bets and after 5000 bets you'll be up on average 30k (remember this is on $100 units) running the same sim on 60%+ winner is redundent.
If you can hit 55% consistently and get down it's like having your own private ATM. Think about that the next time you seeing someone on twitter selling their "70% Weekend Special" for $199.
So far we've been using a flat betting model where our bettors bet size never varies. While this might be easier to model it isn't very realistic. We could go with something like the Kelly criterion to determine the bet size but we'll opt for easy again and use "x" percentage of the bettor's bankroll. This means instead of using an infinite bankroll where a player can never run out of money we will start with a fixed sum. The bankroll size will determine the bet amount.
bettor_bankroll_sim <- function(bets=500, bet_size_p=0.05, bank_roll=1000, winPct=.5) {
results <- rep(0, bets)
for (i in 1:length(results)) {
bet <- bet_size_p * bank_roll
bet <- sample(c(bet, (bet+(bet*.1))*-1),1, prob = c(winPct, (1-winPct)))
bank_roll <- sum(bank_roll, bet)
results[i] <- bank_roll
}
results
}
bettor_bankroll_sim(500, .05, 1000) %>% head()
## [1] 1050.0000 992.2500 937.6762 886.1041 837.3683 879.2367
df <- bind_cols(as_data_frame(replicate(500, bettor_bankroll_sim())))
names(df) <- map_chr(1:500, function(.x) {
paste0("bettor_", .x)
})
sum(tail(df, 1) > 1000)
## [1] 15
The last bit of code gets the number of bettors that, after 500 bets, have a higher bank roll than they started with. That's out of 500 bettors, hitting 50% winners. Let's look at how 55% winners do.
df <-
bind_cols(as_data_frame(replicate(500, bettor_bankroll_sim(winPct = .55))))
names(df) <- map_chr(1:500, function(.x) {
paste0("bettor_", .x)
})
sum(tail(df, 1) > 1000)
## [1] 368
Let's take a look at the average balance of a 55% winner after 500 bets and a bet size equal to 5% of the bettor's bankroll.
df %>%
tail(1) %>%
rowMeans()
## [1] 3945.657
Now let's plot it.
df %>%
tail(1) %>%
gather(bettor, balance, 1:500) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "azure3", col="white", binwidth = 1500) +
scale_x_continuous(labels = scales::dollar) +
theme_minimal()
Plotting the final balance of each bettor gives a striking visual of how profitable being a 55% winner can be. This is only after 500 bets starting with $1,000. Let's do something silly and see what it looks like when we start hitting 60% and we will bump the number of bets up to 1,000.
df <-
bind_cols(as_data_frame(replicate(
500, bettor_bankroll_sim(bets = 1000, winPct = .60)
)))
df %>%
tail(1) %>%
gather(bettor, balance, 1:500) %>%
ggplot(., aes(balance)) +
geom_histogram(fill = "azure3", col="white") +
scale_x_continuous(labels = scales::dollar) +
theme_minimal()
You're not seeing that wrong. The plot shows the majority of bars between $0 and $20 million dollars. We purposely didn't set the binwidth in order not to spoil the surprise. Since that's such an extreme spread let's look at the how the average bettor did after 1,000 bets.
df %>%
tail(1) %>%
rowMeans()
## [1] 2941267
The mean is approx 2.7 million dollars. Take that in... On average if you hit 60% of your bets, betting into -110 lines, while betting 5% of your bankroll after starting with $1,000, you will end up 2.7 million dollars richer after 1,000 bets.
How does the best and worst bettors' fare?
df %>%
tail(1) %>%
t() %>%
summary()
## V1
## Min. : 14045
## 1st Qu.: 298185
## Median : 855186
## Mean : 2941267
## 3rd Qu.: 2452648
## Max. :52071785
The bettor with the worst 'luck' ends up 12x his starting balance while the bettor with the best outcome is sitting on roughly 64 million dollars.