Made by Ari Brown in February - April 2021. The results are visible here, with links to other years as well. The code is available on GitHub.
This is not financial advice -- I am not a financial advisor.
I'm using an SQLite3 database, because I'm a simple man with simple pleasures.
The rest of the program is discussed here.
require 'sqlite3' require 'sequel' require 'alpaca/trade/api' DB = Sequel.connect "sqlite://#{CONFIG[:DB][:path]}" {Bar table, 3} {Ticker table, 8} {Split table, 15} # Load the models for handling the data require './models/ticker.rb' require './models/split.rb' require './models/bar.rb' {Alpaca Ruby API integration, 17}
Just a way to track the standard OHLC data. I also want to know the volume, and I track the "span" because I want to be prepared to handle intraday data as well.
class Bar < Sequel::Model many_to_one :ticker {Alpaca Bar Integration, 18} def inspect "#<Bar id => #{id}, sym => #{ticker.symbol}, date => #{date.strftime "%Y-%m-%d"}, o => #{open}, c => #{close}, h => #{high}, l => #{low}>" end # GREAT trick here: the goal is to figure out how many records # exist between `self` and `from`. Since a simple date calculation # won't work, we just ask the DB to see what's in store for us. # # This originally loaded up `ticker.bars` and then got the indices # for the two elements, but then that got turned into querying the DB # directly and operating on that list, but then I realized it was the # size of that array minus 1, and then I realized I could just get the # count directly from the DB. def trading_days_from(from) raise unless from.ticker == ticker dates = [date, from.date].sort Bar.where(:ticker => ticker, :date => dates[0]..dates[-1]) .order(Sequel.asc(:date)) .count - 1 end {Measurements of Future Bar Change, 6} {Measurements of Past Bar Performance, 7} end
Because Alpaca data is unadjusted for splits/reverse splits, I have to do that adjustment ourselves. This involves aggregating split data and then normalizing the stock data from before the splits accordingly. The adjusted prices are made permanent in the database.
Useful methods for determining how the stock does in the future from this bar. I look at how a stock fares over a given number of days, and how many days it takes for a stock to meet a certain metric.
TODO: In the same way that there's Bar#history, there should probably be a Bar#future.
# the "rise" part of the name is baked into the `>=` # How many trading days does it take to rise by `percent`? # Returns -1 if it never does. def time_to_rise(percent) #bars = self.ticker.bars bars = Bar.where(:ticker => ticker) { date >= self.date } .order(Sequel.asc(:date)) .all index = bars.index self #i = bars[index..-1].index (index..bars.size - 1).each do |j| if bars[j].change_from(bars[index]) >= percent #if self.before_split?(:days => j - index) # return -2 #else return j - index #end end end -1 end # What is the maximum percent rise when compared to `self` over the next # `days` trading days? def max_rise_over(days) bars = Bar.where(:ticker => ticker, :date => date..(date + (days * 1.4).ceil * 86_400)) .order(Sequel.asc(:date)) .all index = bars.index self range = bars[index..(index + days)] range.map {|b| [b, b.change_from(self)] }.max_by {|a| a[1] } end # When is the first two-day period that drops by `drop`, after this bar? def drop(drop) bars = Bar.where(:ticker => ticker) { date >= self.date } .order(Sequel.asc(:date)) .all i = bars.index self fin = bars.size - 1 bars = bars[i..fin] # oldest bar is first bars.each_cons(2).find do |span| span[-1].change_from(span[0]) <= drop end end # When is the first two-day period that rises by `rise`, after this bar? def rise(rise) bars = Bar.where(:ticker => ticker) { date >= self.date } .order(Sequel.asc(:date)) .all i = bars.index self fin = bars.size - 1 bars = bars[i..fin] # oldest bar is first bars.each_cons(2).find do |span| span[-1].change_from(span[0]) >= rise end end
Used in section 4
Tools for evaluating a stock's previous performance. The main algorithm has at
times used the r^2 value, the MSE from the r^2, and the average volume. Now,
my algorithm uses the rank
attribute of each bar, whose formula is detailed
on the main stocks page.
def change_from(bar) (close - bar.open).to_f / bar.open end def rsquared(prior: 10) ticker.rsquared(:at => self, :prior => prior) end def mse(prior: 10) ticker.mse(:at => self, :prior => prior) end def history(prior: 10, post: 0) ticker.history(:around => self, :prior => prior, :post => post) end def volumes(prior: 10) history(:prior => prior).map {|b| b.volume } end def regression(prior: 10) ticker.regression(:at => self, :prior => prior) end
Used in section 4
No surprises here. Not all sources provide a complete list of all stock tickers, so it's good to keep track of them here. AlphaVantage, for instance, does not offer a complete listing, so when I query that API, I have to base the query off of tickers provided by AlpacaMarkets.
Models a stock ticker: its symbol, the exchange it's traded on, the splits it's done, and the bars of data I have for it.
class Ticker < Sequel::Model one_to_many :bars, :order => :date one_to_many :splits, :order => :date {Measurement of Ticker Ranking, 10} {Measurements of Past Ticker Performance, 11} {Measurements of Future Ticker Change, 12} {Split Handling, 13} {AlphaVantage Integration, 14} # hook to ensure no orphans def before_destroy splits.map {|s| s.destroy } bars.map {|b| b.destroy } super end end
# Return N_trade / P_day rankings # # This would be SO MUCH FASTER if I just wrote the SQL by hand (since Sequel # doesn't allow me to do GROUP BY and AVG) def self.rankings(stocks: nil, date: Time.parse(Date.today.to_s), prior: 10) @@rankings ||= {} return @@rankings[[stocks, date, prior]] if @@rankings[[stocks, date, prior]] tids = stocks.map {|t| t.id} query = DB[:bars].where(:ticker_id => tids, :date => (date - prior.days)..date) .group(:ticker_id) .select_append(:ticker_id) .sql query.gsub! "*", "AVG(`volume`)" volumes = DB.fetch(query).all.inject({}) do |vols, hash| vols[hash[:ticker_id]] = hash[:"AVG(`volume`)"] vols end closes = DB[:bars].select(:close, :ticker_id) .where(:ticker_id => tids, :date => date) .all closes = closes.inject({}) do |cls, hash| cls[hash[:ticker_id]] = hash[:close] cls end # {"SYM" => [Rank, Value]} ranks = {} values = {} stocks.each do |stock| if volumes[stock.id] == nil || closes[stock.id] == nil values[stock] = 0 else values[stock] = volumes[stock.id] / closes[stock.id] end end sorted_values = values.values.sort.reverse values.each {|tick, value| ranks[tick.id] = [sorted_values.index(value), value] } @@rankings[[stocks, date, prior]] = ranks end
Used in section 9
# bar history based on a bar def history(around: nil, prior: 10, post: 5) idx = around ? bars.index(around) : -1 bars[[idx - prior + 1, 0].max..(idx + post)] end def volumes(at: nil, prior: 10, post: 0) history(:around => at, :prior => prior, :post => post).map {|b| b.volume } end # This is an absolute value, so a line with 3% shift of a trend at $600 will # have a greater MSE than a line with 3% shift of a trend at $6. # The cheaper a stock is, the more volatility it is allowed to have. def mse(at: nil, prior: 10) regression(:at => at, :prior => prior).meanSqError end # r^2 = 1 - NMSE # # Since I was originally using MSE and then I wanted to normalize it, # r^2 became an easier figure to use. Still gotta write my own function # for it (and particularly NMSE) so I can claim I understand it. def rsquared(at: nil, prior: 10) regression(:at => at, :prior => prior).rSquared end def regression(at: nil, prior: 10) domain = history(:around => at, :prior => prior, :post => 0)[0..-2] prices = domain.map {|b| [b.open, b.close] }.flatten xs = (1..prices.size).to_a line_fit = LineFit.new line_fit.setData xs, prices line_fit end
Used in section 9
# Find all 2-day drops of at least `drop` def drops(drop) # oldest bar is first bars.each_cons(2).filter do |span| span[-1].change_from(span[0]) <= drop end end # Find all 2-day rises of at least `rise` def rises(rise) # oldest bar is first bars.each_cons(2).filter do |span| span[-1].change_from(span[0]) >= rise end end
Used in section 9
A common bug when writing split-handling code is that I will accidentally count calendar days, when I should be counting trading days.
Splits are recorded in the database and track whether they've been applied to the data. The adjusted prices are thus also permanently recorded. The original price is theoretically available because of the ratios are stored as strings ("1 for 20") in each split record, but I have not verified that they are accurate.
# normalize the prices to get rid of splits # percentage drops will still be evident # # THIS SHOULD BE RARELY CALLED # THE DATA SHOULD BE STORED IN ITS NORMALIZED FORM def normalize!(debug: false) # operating on hashes and optimized to minimize calls to the DB # and also minimizing the number of objects created ticker.splits.each do |split| next if split.applied unnorm_size = DB[:bars].where(:ticker_id => id, :date => Time.parse('1 jan 1900')..split[:date]) .count next unless unnorm_size >= 2 unnormal = DB[:bars].where(:ticker_id => id, :date => (split[:date] - 30 * 86400)..split[:date]) .order(Sequel.asc(:date)) .all ratio = unnormal[-1][:open] / unnormal[-2][:close] puts "\tupdating #{unnorm_size} bars" if debug DB[:bars].where(:ticker_id => id, :date => Time.parse('1 jan 1900')..(split[:date] - 1.day)) .update(:close => Sequel[:close] * ratio, :open => Sequel[:open] * ratio, :high => Sequel[:high] * ratio, :low => Sequel[:low] * ratio) split.applied = true split.save end end def normalized?; splits.all {|s| s.applied } ; end
Used in section 9
AlphaVantage is a site that provides more accurate historical stock data than Alpaca, at the expense of being limited to 5 API calls/minute and 500 calls/day. In addition, I can only query one stock per call.
def download!(since: '2008-01-01') stock = AV_CLIENT.stock :symbol => symbol series = stock.timeseries :outputsize => 'full' bars = series.output['Time Series (Daily)'] bars = bars.filter {|k, bar| k > since } insertion = bars.map do |k, bar| {:date => Time.parse(k), :open => bar['1. open'].to_f, :high => bar['2. high'].to_f, :low => bar['3. low'].to_f, :close => bar['4. close'].to_f, :volume => bar['5. volume'].to_i, :span => 'day', :ticker_id => id } end DB[:bars].multi_insert insertion end
Used in section 9
Split information is important to correct for quantum price shifts. The data from Alpaca (and Yahoo! Finance) is uncorrected, which means if a split occurs, the price will appear to have jumped (e.g.) 450% from before to after.
I get my split information from https://stocksplithistory.com, and update the database through a separate script. I used to get my information from a different source, but I found that it was incomplete. That is why I use "ex" as a column name: it means "execution date".
class Split < Sequel::Model many_to_one :ticker end
Just a helper method for quickly saving the results from a web query.
class Alpaca::Trade::Api::Bar def date; @time; end def save(symbol, period) ::Bar.create :span => period, :close => @close, :high => @high, :low => @low, :open => @open, :date => @time, :volume => @volume, :ticker_id => ::Ticker.where(:symbol => symbol).first.id end end class Numeric # useful for dealing with Time def days self * 86400.0 end alias_method :day, :days end
Used in section 2
Easy way to download data within the Bar class. The time component is a String from the CLI, a Time object from SQLite3, and the Alpaca web API wants a String that is conveniently formatted from a DateTime object.
This is a common theme in the code.
def self.download(tickers, opts={}) span = opts.delete(:span) || 'day' opts.each do |k, v| if [String, Date, DateTime, Time].include? v.class opts[k] = DateTime.parse(v.to_s).to_s end end # `CLIENT.bars` returns a hash, so this will also merge them all # into one. key collision will only happen if the key is duplicated # in the `ticker` argument. symbols = tickers.map {|t| t.symbol } symbols.each_slice(50).map do |ticks| CLIENT.bars span, ticks, opts end.inject({}) {|h, v| h.merge v } end
Used in section 4