Database and Models

1. Results

Made by Ari Brown in February - April 2021. The results are visible here, with links to other years as well. The code is available on GitHub.

This is not financial advice -- I am not a financial advisor.

2. Database Schema

I'm using an SQLite3 database, because I'm a simple man with simple pleasures.

The rest of the program is discussed here.

{db.rb 2}
require 'sqlite3'
require 'sequel'
require 'alpaca/trade/api'

DB = Sequel.connect "sqlite://#{CONFIG[:DB][:path]}"

{Bar table, 3}

{Ticker table, 8}

{Split table, 15}

# Load the models for handling the data
require './models/ticker.rb'
require './models/split.rb'
require './models/bar.rb'

{Alpaca Ruby API integration, 17}

3. Bar Table

Just a way to track the standard OHLC data. I also want to know the volume, and I track the "span" because I want to be prepared to handle intraday data as well.

{Bar table 3}
DB.create_table? :bars do
  primary_key :id
  foreign_key :ticker_id, :tickers
  float :close
  float :high
  float :low
  float :open
  datetime :datetime
  integer :volume
  string :span # day, 15 min, 5 min, 1 min
  integer :rank
  float :value

  index :ticker_id
end

Used in section 2

4. Bar Model

{models/bar.rb 4}
class Bar < Sequel::Model
  many_to_one :ticker

  {Alpaca Bar Integration, 18}

  def inspect
    "#<Bar id => #{id}, sym => #{ticker.symbol}, date => #{date.strftime "%Y-%m-%d"}, o => #{open}, c => #{close}, h => #{high}, l => #{low}>"
  end

  # GREAT trick here: the goal is to figure out how many records
  # exist between `self` and `from`. Since a simple date calculation
  # won't work, we just ask the DB to see what's in store for us.
  #
  # This originally loaded up `ticker.bars` and then got the indices
  # for the two elements, but then that got turned into querying the DB
  # directly and operating on that list, but then I realized it was the
  # size of that array minus 1, and then I realized I could just get the
  # count directly from the DB.
  def trading_days_from(from)
    raise unless from.ticker == ticker

    dates = [date, from.date].sort

    Bar.where(:ticker => ticker, :date => dates[0]..dates[-1])
       .order(Sequel.asc(:date))
       .count - 1
  end

  {Measurements of Future Bar Change, 6}

  {Measurements of Past Bar Performance, 7}
end

5. Bar normalization

Because Alpaca data is unadjusted for splits/reverse splits, I have to do that adjustment ourselves. This involves aggregating split data and then normalizing the stock data from before the splits accordingly. The adjusted prices are made permanent in the database.

6. Measurements of Future Bar Change

Useful methods for determining how the stock does in the future from this bar. I look at how a stock fares over a given number of days, and how many days it takes for a stock to meet a certain metric.


TODO: In the same way that there's Bar#history, there should probably be a Bar#future.

{Measurements of Future Bar Change 6}
  # the "rise" part of the name is baked into the `>=`
  # How many trading days does it take to rise by `percent`?
  # Returns -1 if it never does.
  def time_to_rise(percent)
    #bars  = self.ticker.bars
    bars  = Bar.where(:ticker => ticker) { date >= self.date }
               .order(Sequel.asc(:date))
               .all
    index = bars.index self
    #i = bars[index..-1].index

    (index..bars.size - 1).each do |j|
      if bars[j].change_from(bars[index]) >= percent
        #if self.before_split?(:days => j - index)
        #  return -2
        #else
          return j - index
        #end
      end
    end

    -1
  end

  # What is the maximum percent rise when compared to `self` over the next
  # `days` trading days?
  def max_rise_over(days)
    bars  = Bar.where(:ticker => ticker,
                      :date => date..(date + (days * 1.4).ceil * 86_400))
               .order(Sequel.asc(:date))
               .all
    index = bars.index self
    range = bars[index..(index + days)]
    range.map {|b| [b, b.change_from(self)] }.max_by {|a| a[1] }
  end

  # When is the first two-day period that drops by `drop`, after this bar?
  def drop(drop)
    bars  = Bar.where(:ticker => ticker) { date >= self.date }
               .order(Sequel.asc(:date))
               .all
    i    = bars.index self
    fin  = bars.size - 1
    bars = bars[i..fin]

    # oldest bar is first
    bars.each_cons(2).find do |span|
      span[-1].change_from(span[0]) <= drop
    end
  end

  # When is the first two-day period that rises by `rise`, after this bar?
  def rise(rise)
    bars  = Bar.where(:ticker => ticker) { date >= self.date }
               .order(Sequel.asc(:date))
               .all
    i    = bars.index self
    fin  = bars.size - 1
    bars = bars[i..fin]

    # oldest bar is first
    bars.each_cons(2).find do |span|
      span[-1].change_from(span[0]) >= rise
    end
  end

Used in section 4

7. Measurements of Past Bar Performance

Tools for evaluating a stock's previous performance. The main algorithm has at times used the r^2 value, the MSE from the r^2, and the average volume. Now, my algorithm uses the rank attribute of each bar, whose formula is detailed on the main stocks page.

{Measurements of Past Bar Performance 7}
  def change_from(bar)
    (close - bar.open).to_f / bar.open
  end

  def rsquared(prior: 10)
    ticker.rsquared(:at => self, :prior => prior)
  end

  def mse(prior: 10)
    ticker.mse(:at => self, :prior => prior)
  end

  def history(prior: 10, post: 0)
    ticker.history(:around => self, :prior => prior, :post => post)
  end

  def volumes(prior: 10)
    history(:prior => prior).map {|b| b.volume }
  end

  def regression(prior: 10)
    ticker.regression(:at => self, :prior => prior)
  end

Used in section 4

8. Ticker Table

No surprises here. Not all sources provide a complete list of all stock tickers, so it's good to keep track of them here. AlphaVantage, for instance, does not offer a complete listing, so when I query that API, I have to base the query off of tickers provided by AlpacaMarkets.

{Ticker table 8}
DB.create_table? :tickers do
  primary_key :id
  string :symbol
  string :exchange
end

Used in section 2

9. Ticker Model

Models a stock ticker: its symbol, the exchange it's traded on, the splits it's done, and the bars of data I have for it.

{models/ticker.rb 9}
class Ticker < Sequel::Model
  one_to_many :bars, :order => :date
  one_to_many :splits, :order => :date

  {Measurement of Ticker Ranking, 10}

  {Measurements of Past Ticker Performance, 11}

  {Measurements of Future Ticker Change, 12}

  {Split Handling, 13}

  {AlphaVantage Integration, 14}

  # hook to ensure no orphans
  def before_destroy
    splits.map {|s| s.destroy }
    bars.map {|b| b.destroy }
    super
  end
end

10. Measurement of Ticker Ranking

{Measurement of Ticker Ranking 10}
  # Return N_trade / P_day rankings
  #
  # This would be SO MUCH FASTER if I just wrote the SQL by hand (since Sequel
  # doesn't allow me to do GROUP BY and AVG)
  def self.rankings(stocks: nil, date: Time.parse(Date.today.to_s), prior: 10)
    @@rankings ||= {}
    return @@rankings[[stocks, date, prior]] if @@rankings[[stocks, date, prior]]

    tids = stocks.map {|t| t.id}

    query = DB[:bars].where(:ticker_id => tids, :date => (date - prior.days)..date)
                     .group(:ticker_id)
                     .select_append(:ticker_id)
                     .sql
    query.gsub! "*", "AVG(`volume`)"
    volumes = DB.fetch(query).all.inject({}) do |vols, hash|
      vols[hash[:ticker_id]] = hash[:"AVG(`volume`)"]
      vols
    end

    closes = DB[:bars].select(:close, :ticker_id)
                      .where(:ticker_id => tids, :date => date)
                      .all
    closes = closes.inject({}) do |cls, hash|
      cls[hash[:ticker_id]] = hash[:close]
      cls
    end

    # {"SYM" => [Rank, Value]}
    ranks = {}

    values = {}
    stocks.each do |stock|
      if volumes[stock.id] == nil || closes[stock.id] == nil
        values[stock] = 0
      else
        values[stock] = volumes[stock.id] / closes[stock.id]
      end
    end
    sorted_values = values.values.sort.reverse

    values.each {|tick, value| ranks[tick.id] = [sorted_values.index(value), value] }

    @@rankings[[stocks, date, prior]] = ranks
  end

Used in section 9

11. Measurements of Past Ticker Performance

{Measurements of Past Ticker Performance 11}
  # bar history based on a bar
  def history(around: nil, prior: 10, post: 5)
    idx  = around ? bars.index(around) : -1
    bars[[idx - prior + 1, 0].max..(idx + post)]
  end

  def volumes(at: nil, prior: 10, post: 0)
    history(:around => at, :prior => prior, :post => post).map {|b| b.volume }
  end

  # This is an absolute value, so a line with 3% shift of a trend at $600 will
  # have a greater MSE than a line with 3% shift of a trend at $6.
  # The cheaper a stock is, the more volatility it is allowed to have.
  def mse(at: nil, prior: 10)
    regression(:at => at, :prior => prior).meanSqError
  end

  # r^2 = 1 - NMSE
  #
  # Since I was originally using MSE and then I wanted to normalize it,
  # r^2 became an easier figure to use. Still gotta write my own function
  # for it (and particularly NMSE) so I can claim I understand it.
  def rsquared(at: nil, prior: 10)
    regression(:at => at, :prior => prior).rSquared
  end

  def regression(at: nil, prior: 10)
    domain   = history(:around => at, :prior => prior, :post => 0)[0..-2]
    prices   = domain.map {|b| [b.open, b.close] }.flatten
    xs       = (1..prices.size).to_a

    line_fit = LineFit.new
    line_fit.setData xs, prices 
    line_fit
  end

Used in section 9

12. Measurements of Future Ticker Change

{Measurements of Future Ticker Change 12}
  # Find all 2-day drops of at least `drop`
  def drops(drop)
    # oldest bar is first
    bars.each_cons(2).filter do |span|
      span[-1].change_from(span[0]) <= drop
    end
  end

  # Find all 2-day rises of at least `rise`
  def rises(rise)
    # oldest bar is first
    bars.each_cons(2).filter do |span|
      span[-1].change_from(span[0]) >= rise
    end
  end

Used in section 9

13. Split Handling

A common bug when writing split-handling code is that I will accidentally count calendar days, when I should be counting trading days.

Splits are recorded in the database and track whether they've been applied to the data. The adjusted prices are thus also permanently recorded. The original price is theoretically available because of the ratios are stored as strings ("1 for 20") in each split record, but I have not verified that they are accurate.

{Split Handling 13}
  # normalize the prices to get rid of splits
  # percentage drops will still be evident
  #
  # THIS SHOULD BE RARELY CALLED
  # THE DATA SHOULD BE STORED IN ITS NORMALIZED FORM
  def normalize!(debug: false)
    # operating on hashes and optimized to minimize calls to the DB
    # and also minimizing the number of objects created
    ticker.splits.each do |split|
      next if split.applied

      unnorm_size = DB[:bars].where(:ticker_id => id,
                                    :date => Time.parse('1 jan 1900')..split[:date])
                             .count

      next unless unnorm_size >= 2

      unnormal = DB[:bars].where(:ticker_id => id,
                                 :date => (split[:date] - 30 * 86400)..split[:date])
                          .order(Sequel.asc(:date))
                          .all
      ratio = unnormal[-1][:open] / unnormal[-2][:close]

      puts "\tupdating #{unnorm_size} bars" if debug

      DB[:bars].where(:ticker_id => id,
                      :date => Time.parse('1 jan 1900')..(split[:date] - 1.day))
               .update(:close => Sequel[:close] * ratio,
                       :open  => Sequel[:open]  * ratio,
                       :high  => Sequel[:high]  * ratio,
                       :low   => Sequel[:low]   * ratio)

      split.applied = true
      split.save
    end
  end

  def normalized?; splits.all {|s| s.applied } ; end

Used in section 9

14. AlphaVantage Integration

AlphaVantage is a site that provides more accurate historical stock data than Alpaca, at the expense of being limited to 5 API calls/minute and 500 calls/day. In addition, I can only query one stock per call.

{AlphaVantage Integration 14}
  def download!(since: '2008-01-01')
    stock  = AV_CLIENT.stock :symbol => symbol
    series = stock.timeseries :outputsize => 'full'

    bars = series.output['Time Series (Daily)']
    bars = bars.filter {|k, bar| k > since }

    insertion = bars.map do |k, bar|
      {:date   => Time.parse(k),
       :open   => bar['1. open'].to_f,
       :high   => bar['2. high'].to_f,
       :low    => bar['3. low'].to_f,
       :close  => bar['4. close'].to_f,
       :volume => bar['5. volume'].to_i,
       :span   => 'day',
       :ticker_id => id
      }
    end
    DB[:bars].multi_insert insertion
  end

Used in section 9

15. Split Table

Split information is important to correct for quantum price shifts. The data from Alpaca (and Yahoo! Finance) is uncorrected, which means if a split occurs, the price will appear to have jumped (e.g.) 450% from before to after.

I get my split information from https://stocksplithistory.com, and update the database through a separate script. I used to get my information from a different source, but I found that it was incomplete. That is why I use "ex" as a column name: it means "execution date".

{Split table 15}
DB.create_table? :splits do
  primary_key :id
  foreign_key :ticker_id, :tickers
  string :ratio
  datetime :date

  index :ticker_id
end

Used in section 2

16. Split Model

{models/split.rb 16}
class Split < Sequel::Model
  many_to_one :ticker
end

17. Alpaca Ruby API Integration

Just a helper method for quickly saving the results from a web query.

{Alpaca Ruby API integration 17}
class Alpaca::Trade::Api::Bar
  def date; @time; end

  def save(symbol, period)
    ::Bar.create :span   => period,
                 :close  => @close,
                 :high   => @high,
                 :low    => @low,
                 :open   => @open,
                 :date   => @time,
                 :volume => @volume,
                 :ticker_id => ::Ticker.where(:symbol => symbol).first.id
  end
end

class Numeric
  # useful for dealing with Time
  def days
    self * 86400.0
  end
  alias_method :day, :days
end

Used in section 2

18. Alpaca Integration

Easy way to download data within the Bar class. The time component is a String from the CLI, a Time object from SQLite3, and the Alpaca web API wants a String that is conveniently formatted from a DateTime object.

This is a common theme in the code.

{Alpaca Bar Integration 18}
  def self.download(tickers, opts={})
    span = opts.delete(:span) || 'day'

    opts.each do |k, v| 
      if [String, Date, DateTime, Time].include? v.class
        opts[k] = DateTime.parse(v.to_s).to_s
      end
    end

    # `CLIENT.bars` returns a hash, so this will also merge them all
    # into one. key collision will only happen if the key is duplicated
    # in the `ticker` argument.
    symbols = tickers.map {|t| t.symbol }
    symbols.each_slice(50).map do |ticks|
      CLIENT.bars span, ticks, opts
    end.inject({}) {|h, v| h.merge v }
  end

Used in section 4