Love, and to be Loved.

我愛你，你是自由的。

2016-10-13 | data science

An Introduction to Stock Market Data Analysis with Python (FW)

PART I

This post is the first in a two-part series on stock data analysis using Python, based on a lecture I gave on the subject for MATH 3900 (Data Science) at the University of Utah. In these posts, I will discuss basics such as obtaining the data from Yahoo! Finance using *pandas**, visualizing stock data, moving averages, developing a moving-average crossover strategy, backtesting, and benchmarking. The final post will include practice problems. This first post discusses topics up to introducing moving averages.

NOTE: The information in this post is of a general nature containing information and opinions from the author’s perspective. None of the content of this post should be considered financial advice. Furthermore, any code written here is provided without any form of guarantee. Individuals who choose to use it do so at their own risk.

Introduction

Advanced mathematics and statistics has been present in finance for some time. Prior to the 1980s, banking and finance were well known for being “boring”; investment banking was distinct from commercial banking and the primary role of the industry was handling “simple” (at least in comparison to today) financial instruments, such as loans. Deregulation under the Reagan administration, coupled with an influx of mathematical talent, transformed the industry from the “boring” business of banking to what it is today, and since then, finance has joined the other sciences as a motivation for mathematical research and advancement. For example one of the biggest recent achievements of mathematics was the derivation of the Black-Scholes formula, which facilitated the pricing of stock options (a contract giving the holder the right to purchase or sell a stock at a particular price to the issuer of the option). That said, bad statistical models, including the Black-Scholes formula, hold part of the blame for the 2008 financial crisis.

In recent years, computer science has joined advanced mathematics in revolutionizing finance and trading, the practice of buying and selling of financial assets for the purpose of making a profit. In recent years, trading has become dominated by computers; algorithms are responsible for making rapid split-second trading decisions faster than humans could make (so rapidly, the speed at which light travels is a limitation when designing systems). Additionally, machine learning and data mining techniques are growing in popularity in the financial sector, and likely will continue to do so. In fact, a large part of algorithmic trading is high-frequency trading (HFT). While algorithms may outperform humans, the technology is still new and playing in a famously turbulent, high-stakes arena. HFT was responsible for phenomena such as the 2010 flash crash and a 2013 flash crash prompted by a hacked Associated Press tweet about an attack on the White House.

This lecture, however, will not be about how to crash the stock market with bad mathematical models or trading algorithms. Instead, I intend to provide you with basic tools for handling and analyzing stock market data with Python. I will also discuss moving averages, how to construct trading strategies using moving averages, how to formulate exit strategies upon entering a position, and how to evaluate a strategy with backtesting.

DISCLAIMER: THIS IS NOT FINANCIAL ADVICE!!! Furthermore, I have ZERO experience as a trader (a lot of this knowledge comes from a one-semester course on stock trading I took at Salt Lake Community College)! This is purely introductory knowledge, not enough to make a living trading stocks. People can and do lose money trading stocks, and you do so at your own risk!

Getting and Visualizing Stock Data

Getting Data from Yahoo! Finance with pandas

Before we play with stock data, we need to get it in some workable format. Stock data can be obtained from Yahoo! Finance, Google Finance, or a number of other sources, and the pandas package provides easy access to Yahoo! Finance and Google Finance data, along with other sources. In this lecture, we will get our data from Yahoo! Finance.

The following code demonstrates how to create directly a DataFrame object containing stock information. (You can read more about remote data accesshere.)

import pandas as pd
import pandas.io.data as web   # Package and modules for importing data; this code may change depending on pandas version
import datetime
 
# We will look at stock prices over the past year, starting at January 1, 2016
start = datetime.datetime(2016,1,1)
end = datetime.date.today()
 
# Let's get Apple stock data; Apple's ticker symbol is AAPL
# First argument is the series we want, second is the source ("yahoo" for Yahoo! Finance), third is the start date, fourth is the end date
apple = web.DataReader("AAPL", "yahoo", start, end)
 
type(apple)


C:\Anaconda3\lib\site-packages\pandas\io\data.py:35: FutureWarning: 
The pandas.io.data module is moved to a separate package (pandas-datareader) and will be removed from pandas in a future version.
After installing the pandas-datareader package (https://github.com/pydata/pandas-datareader), you can change the import ``from pandas.io import data, wb`` to ``from pandas_datareader import data, wb``.
  FutureWarning)





pandas.core.frame.DataFrame

1	apple.head()

	Open	High	Low	Close	Volume	Adj Close
Date
2016-01-04	102.610001	105.370003	102.000000	105.349998	67649400	103.586180
2016-01-05	105.750000	105.849998	102.410004	102.709999	55791000	100.990380
2016-01-06	100.559998	102.370003	99.870003	100.699997	68457400	99.014030
2016-01-07	98.680000	100.129997	96.430000	96.449997	81094400	94.835186
2016-01-08	98.550003	99.110001	96.760002	96.959999	70798000	95.336649

Let’s briefly discuss this. Open is the price of the stock at the beginning of the trading day (it need not be the closing price of the previous trading day),high is the highest price of the stock on that trading day, low the lowest price of the stock on that trading day, and close the price of the stock at closing time. Volume indicates how many stocks were traded. Adjusted close is the closing price of the stock that adjusts the price of the stock for corporate actions. While stock prices are considered to be set mostly by traders, stock splits (when the company makes each extant stock worth two and halves the price) and dividends (payout of company profits per share) also affect the price of a stock and should be accounted for.

Visualizing Stock Data

Now that we have stock data we would like to visualize it. I first demonstrate how to do so using the matplotlib package. Notice that the apple`DataFrameobject has a convenience method,plot()`, which makes creating plots easier.

import matplotlib.pyplot as plt   # Import matplotlib
# This line is necessary for the plot to appear in a Jupyter notebook
%matplotlib inline
# Control the default size of figures in this Jupyter notebook
%pylab inline
pylab.rcParams['figure.figsize'] = (15, 9)   # Change the size of plots
 
apple["Adj Close"].plot(grid = True) # Plot the adjusted closing price of AAPL


Populating the interactive namespace from numpy and matplotlib

A linechart is fine, but there are at least four variables involved for each date (open, high, low, and close), and we would like to have some visual way to see all four variables that does not require plotting four separate lines. Financial data is often plotted with a Japanese candlestick plot, so named because it was first created by 18th century Japanese rice traders. Such a chart can be created with matplotlib, though it requires considerable effort.

I have made a function you are welcome to use to more easily create candlestick charts from pandas data frames, and use it to plot our stock data. (Code is based off this example, and you can read the documentation for the functions involved here.)

from matplotlib.dates import DateFormatter, WeekdayLocator,\
    DayLocator, MONDAY
from matplotlib.finance import candlestick_ohlc
 
def pandas_candlestick_ohlc(dat, stick = "day", otherseries = None):
    """
    :param dat: pandas DataFrame object with datetime64 index, and float columns "Open", "High", "Low", and "Close", likely created via DataReader from "yahoo"
    :param stick: A string or number indicating the period of time covered by a single candlestick. Valid string inputs include "day", "week", "month", and "year", ("day" default), and any numeric input indicates the number of trading days included in a period
    :param otherseries: An iterable that will be coerced into a list, containing the columns of dat that hold other series to be plotted as lines
 
    This will show a Japanese candlestick plot for stock data stored in dat, also plotting other series if passed.
    """
    mondays = WeekdayLocator(MONDAY)        # major ticks on the mondays
    alldays = DayLocator()              # minor ticks on the days
    dayFormatter = DateFormatter('%d')      # e.g., 12
 
    # Create a new DataFrame which includes OHLC data for each period specified by stick input
    transdat = dat.loc[:,["Open", "High", "Low", "Close"]]
    if (type(stick) == str):
        if stick == "day":
            plotdat = transdat
            stick = 1 # Used for plotting
        elif stick in ["week", "month", "year"]:
            if stick == "week":
                transdat["week"] = pd.to_datetime(transdat.index).map(lambda x: x.isocalendar()[1]) # Identify weeks
            elif stick == "month":
                transdat["month"] = pd.to_datetime(transdat.index).map(lambda x: x.month) # Identify months
            transdat["year"] = pd.to_datetime(transdat.index).map(lambda x: x.isocalendar()[0]) # Identify years
            grouped = transdat.groupby(list(set(["year",stick]))) # Group by year and other appropriate variable
            plotdat = pd.DataFrame({"Open": [], "High": [], "Low": [], "Close": []}) # Create empty data frame containing what will be plotted
            for name, group in grouped:
                plotdat = plotdat.append(pd.DataFrame({"Open": group.iloc[0,0],
                                            "High": max(group.High),
                                            "Low": min(group.Low),
                                            "Close": group.iloc[-1,3]},
                                           index = [group.index[0]]))
            if stick == "week": stick = 5
            elif stick == "month": stick = 30
            elif stick == "year": stick = 365
 
    elif (type(stick) == int and stick >= 1):
        transdat["stick"] = [np.floor(i / stick) for i in range(len(transdat.index))]
        grouped = transdat.groupby("stick")
        plotdat = pd.DataFrame({"Open": [], "High": [], "Low": [], "Close": []}) # Create empty data frame containing what will be plotted
        for name, group in grouped:
            plotdat = plotdat.append(pd.DataFrame({"Open": group.iloc[0,0],
                                        "High": max(group.High),
                                        "Low": min(group.Low),
                                        "Close": group.iloc[-1,3]},
                                       index = [group.index[0]]))
 
    else:
        raise ValueError('Valid inputs to argument "stick" include the strings "day", "week", "month", "year", or a positive integer')
 
 
    # Set plot parameters, including the axis object ax used for plotting
    fig, ax = plt.subplots()
    fig.subplots_adjust(bottom=0.2)
    if plotdat.index[-1] - plotdat.index[0] < pd.Timedelta('730 days'):
        weekFormatter = DateFormatter('%b %d')  # e.g., Jan 12
        ax.xaxis.set_major_locator(mondays)
        ax.xaxis.set_minor_locator(alldays)
    else:
        weekFormatter = DateFormatter('%b %d, %Y')
    ax.xaxis.set_major_formatter(weekFormatter)
 
    ax.grid(True)
 
    # Create the candelstick chart
    candlestick_ohlc(ax, list(zip(list(date2num(plotdat.index.tolist())), plotdat["Open"].tolist(), plotdat["High"].tolist(),
                      plotdat["Low"].tolist(), plotdat["Close"].tolist())),
                      colorup = "black", colordown = "red", width = stick * .4)
 
    # Plot other series (such as moving averages) as lines
    if otherseries != None:
        if type(otherseries) != list:
            otherseries = [otherseries]
        dat.loc[:,otherseries].plot(ax = ax, lw = 1.3, grid = True)
 
    ax.xaxis_date()
    ax.autoscale_view()
    plt.setp(plt.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
 
    plt.show()
 
pandas_candlestick_ohlc(apple)

With a candlestick chart, a black candlestick indicates a day where the closing price was higher than the open (a gain), while a red candlestick indicates a day where the open was higher than the close (a loss). The wicks indicate the high and the low, and the body the open and close (hue is used to determine which end of the body is the open and which the close). Candlestick charts are popular in finance and some strategies in technical analysis use them to make trading decisions, depending on the shape, color, and position of the candles. I will not cover such strategies today.

We may wish to plot multiple financial instruments together; we may want to compare stocks, compare them to the market, or look at other securities such as exchange-traded funds (ETFs). Later, we will also want to see how to plot a financial instrument against some indicator, like a moving average. For this you would rather use a line chart than a candlestick chart. (How would you plot multiple candlestick charts on top of one another without cluttering the chart?)

Below, I get stock data for some other tech companies and plot their adjusted close together.

microsoft = web.DataReader("MSFT", "yahoo", start, end)
google = web.DataReader("GOOG", "yahoo", start, end)
 
# Below I create a DataFrame consisting of the adjusted closing price of these stocks, first by making a list of these objects and using the join method
stocks = pd.DataFrame({"AAPL": apple["Adj Close"],
                      "MSFT": microsoft["Adj Close"],
                      "GOOG": google["Adj Close"]})
 
stocks.head()

	AAPL	GOOG	MSFT
Date
2016-01-04	103.586180	741.840027	53.696756
2016-01-05	100.990380	742.580017	53.941723
2016-01-06	99.014030	743.619995	52.961855
2016-01-07	94.835186	726.390015	51.119702
2016-01-08	95.336649	714.469971	51.276485

1	stocks.plot(grid = True)

What’s wrong with this chart? While absolute price is important (pricy stocks are difficult to purchase, which affects not only their volatility but your ability to trade that stock), when trading, we are more concerned about the relative change of an asset rather than its absolute price. Google’s stocks are much more expensive than Apple’s or Microsoft’s, and this difference makes Apple’s and Microsoft’s stocks appear much less volatile than they truly are.

One solution would be to use two different scales when plotting the data; one scale will be used by Apple and Microsoft stocks, and the other by Google.

1	stocks.plot(secondary_y = ["AAPL", "MSFT"], grid = True)

A “better” solution, though, would be to plot the information we actually want: the stock’s returns. This involves transforming the data into something more useful for our purposes. There are multiple transformations we could apply.

One transformation would be to consider the stock’s return since the beginning of the period of interest. In other words, we plot:

$\text{return}_{t,0} = \frac{\text{price}_t}{\text{price}_0}$

This will require transforming the data in the stocks object, which I do next.

# df.apply(arg) will apply the function arg to each column in df, and return a DataFrame with the result
# Recall that lambda x is an anonymous function accepting parameter x; in this case, x will be a pandas Series object
stock_return = stocks.apply(lambda x: x / x[0])
stock_return.head()

	AAPL	GOOG	MSFT
Date
2016-01-04	1.000000	1.000000	1.000000
2016-01-05	0.974941	1.000998	1.004562
2016-01-06	0.955861	1.002399	0.986314
2016-01-07	0.915520	0.979173	0.952007
2016-01-08	0.920361	0.963105	0.954927

1	stock_return.plot(grid = True).axhline(y = 1, color = "black", lw = 2)

This is a much more useful plot. We can now see how profitable each stock was since the beginning of the period. Furthermore, we see that these stocks are highly correlated; they generally move in the same direction, a fact that was difficult to see in the other charts.

Alternatively, we could plot the change of each stock per day. One way to do so would be to plot the percentage increase of a stock when comparing day $t$ to day $t + 1$, with the formula:

$\text{growth}_t = \frac{\text{price}_{t + 1} - \text{price}_t}{\text{price}_t}$

But change could be thought of differently as:

$\text{increase}_t = \frac{\text{price}_{t} - \text{price}_{t-1}}{\text{price}_t}$

These formulas are not the same and can lead to differing conclusions, but there is another way to model the growth of a stock: with log differences.

$\text{change}_t = \log(\text{price}_{t}) - \log(\text{price}_{t - 1})$

(Here, $\log$ is the natural log, and our definition does not depend as strongly on whether we use $\log(\text{price}_{t}) - \log(\text{price}_{t - 1})$ or $\log(\text{price}_{t+1}) - \log(\text{price}_{t})$ .) The advantage of using log differences is that this difference can be interpreted as the percentage change in a stock but does not depend on the denominator of a fraction.

We can obtain and plot the log differences of the data in stocks as follows:

# Let's use NumPy's log function, though math's log function would work just as well
import numpy as np
 
stock_change = stocks.apply(lambda x: np.log(x) - np.log(x.shift(1))) # shift moves dates back by 1.
stock_change.head()

	AAPL	GOOG	MSFT
Date
2016-01-04	NaN	NaN	NaN
2016-01-05	-0.025379	0.000997	0.004552
2016-01-06	-0.019764	0.001400	-0.018332
2016-01-07	-0.043121	-0.023443	-0.035402
2016-01-08	0.005274	-0.016546	0.003062

1	stock_change.plot(grid = True).axhline(y = 0, color = "black", lw = 2)

Which transformation do you prefer? Looking at returns since the beginning of the period make the overall trend of the securities in question much more apparent. Changes between days, though, are what more advanced methods actually consider when modelling the behavior of a stock. so they should not be ignored.

Moving Averages

Charts are very useful. In fact, some traders base their strategies almost entirely off charts (these are the “technicians”, since trading strategies based off finding patterns in charts is a part of the trading doctrine known astechnical analysis). Let’s now consider how we can find trends in stocks.

A $q$ -day moving average is, for a series $x_t$ and a point in time $t$ , the average of the past $q$ days: that is, if $MA^q_t$ denotes a moving average process, then:

$MA^q_t = \frac{1}{q} \sum_{i = 0}^{q-1} x_{t - i}$ latex

Moving averages smooth a series and helps identify trends. The larger $q$ is, the less responsive a moving average process is to short-term fluctuations in the series $x_t$ . The idea is that moving average processes help identify trends from “noise”. Fast moving averages have smaller $q$ and more closely follow the stock, while slow moving averages have larger $q$ , resulting in them responding less to the fluctuations of the stock and being more stable.

pandas provides functionality for easily computing moving averages. I demonstrate its use by creating a 20-day (one month) moving average for the Apple data, and plotting it alongside the stock.

1 2	apple["20d"] = np.round(apple["Close"].rolling(window = 20, center = False).mean(), 2) pandas_candlestick_ohlc(apple.loc['2016-01-04':'2016-08-07',:], otherseries = "20d")

Notice how late the rolling average begins. It cannot be computed until 20 days have passed. This limitation becomes more severe for longer moving averages. Because I would like to be able to compute 200-day moving averages, I’m going to extend out how much AAPL data we have. That said, we will still largely focus on 2016.

start = datetime.datetime(2010,1,1)
apple = web.DataReader("AAPL", "yahoo", start, end)
apple["20d"] = np.round(apple["Close"].rolling(window = 20, center = False).mean(), 2)
 
pandas_candlestick_ohlc(apple.loc['2016-01-04':'2016-08-07',:], otherseries = "20d")

You will notice that a moving average is much smoother than the actua stock data. Additionally, it’s a stubborn indicator; a stock needs to be above or below the moving average line in order for the line to change direction. Thus, crossing a moving average signals a possible change in trend, and should draw attention.

Traders are usually interested in multiple moving averages, such as the 20-day, 50-day, and 200-day moving averages. It’s easy to examine multiple moving averages at once.

apple["50d"] = np.round(apple["Close"].rolling(window = 50, center = False).mean(), 2)
apple["200d"] = np.round(apple["Close"].rolling(window = 200, center = False).mean(), 2)
 
pandas_candlestick_ohlc(apple.loc['2016-01-04':'2016-08-07',:], otherseries = ["20d", "50d", "200d"])

The 20-day moving average is the most sensitive to local changes, and the 200-day moving average the least. Here, the 200-day moving average indicates an overall bearish trend: the stock is trending downward over time. The 20-day moving average is at times bearish and at other timesbullish, where a positive swing is expected. You can also see that the crossing of moving average lines indicate changes in trend. These crossings are what we can use as trading signals, or indications that a financial security is changing direction and a profitable trade might be made.

Visit next week to read about how to design and test a trading strategy using moving averages.

Update: An earlier version of this article suggested that algorithmic trading was synonymous as high-frequency trading. As pointed out in the comments by dissolved, this need not be the case; algorithms can be used to identify trades without necessarily being high frequency. While HFT is a large subset of algorithmic trading, it is not equal to it.

PART II

*This post is the second in a two-part series on stock data analysis using Python, based on a lecture I gave on the subject for MATH 3900 (Data Mining) at the University of Utah (read part 1 here). In these posts, I will discuss basics such as obtaining the data from Yahoo! Finance usingpandas, visualizing stock data, moving averages, developing a moving-average crossover strategy, backtesting, and benchmarking. This second post discusses topics including divising a moving average crossover strategy, backtesting, and benchmarking, along with practice problems for readers to ponder.

Trading Strategy

Call an open position a trade that will be terminated in the future when a condition is met. A long position is one in which a profit is made if the financial instrument traded increases in value, and a short position is on in which a profit is made if the financial asset being traded decreases in value. When trading stocks directly, all long positions are bullish and all short position are bearish. That said, a bullish attitude need not be accompanied by a long position, and a bearish attitude need not be accompanied by a short position (this is particularly true when trading stock options).

Here is an example. Let’s say you buy a stock with the expectation that the stock will increase in value, with a plan to sell the stock at a higher price. This is a long position: you are holding a financial asset for which you will profit if the asset increases in value. Your potential profit is unlimited, and your potential losses are limited by the price of the stock since stock prices never go below zero. On the other hand, if you expect a stock to decrease in value, you may borrow the stock from a brokerage firm and sell it, with the expectation of buying the stock back later at a lower price, thus earning you a profit. This is called shorting a stock, and is a short position, since you will earn a profit if the stock drops in value. The potential profit from shorting a stock is limited by the price of the stock (the best you can do is have the stock become worth nothing; you buy it back for free), while the losses are unlimited, since you could potentially spend an arbitrarily large amount of money to buy the stock back. Thus, a broker will expect an investor to be in a very good financial position before allowing the investor to short a stock.

Any trader must have a set of rules that determine how much of her money she is willing to bet on any single trade. For example, a trader may decide that under no circumstances will she risk more than 10% of her portfolio on a trade. Additionally, in any trade, a trader must have an exit strategy, a set of conditions determining when she will exit the position, for either profit or loss. A trader may set a target, which is the minimum profit that will induce the trader to leave the position. Likewise, a trader must have a maximum loss she is willing to tolerate; if potential losses go beyond this amount, the trader will exit the position in order to prevent any further loss (this is usually done by setting a stop-loss order, an order that is triggered to prevent further losses).

We will call a plan that includes trading signals for prompting trades, a rule for deciding how much of the portfolio to risk on any particular strategy, and a complete exit strategy for any trade an overall trading strategy. Our concern now is to design and evaluate trading strategies.

We will suppose that the amount of money in the portfolio involved in any particular trade is a fixed proportion; 10% seems like a good number. We will also say that for any trade, if losses exceed 20% of the value of the trade, we will exit the position. Now we need a means for deciding when to enter position and when to exit for a profit.

Here, I will be demonstrating a moving average crossover strategy. We will use two moving averages, one we consider “fast”, and the other “slow”. The strategy is:

Trade the asset when the fast moving average crosses over the slow moving average.
Exit the trade when the fast moving average crosses over the slow moving average again.

A long trade will be prompted when the fast moving average crosses from below to above the slow moving average, and the trade will be exited when the fast moving average crosses below the slow moving average later. A short trade will be prompted when the fast moving average crosses below the slow moving average, and the trade will be exited when the fast moving average later crosses above the slow moving average.

We now have a complete strategy. But before we decide we want to use it, we should try to evaluate the quality of the strategy first. The usual means for doing so is backtesting, which is looking at how profitable the strategy is on historical data. For example, looking at the above chart’s performance on Apple stock, if the 20-day moving average is the fast moving average and the 50-day moving average the slow, this strategy does not appear to be very profitable, at least not if you are always taking long positions.

Let’s see if we can automate the backtesting task. We first identify when the 20-day average is below the 50-day average, and vice versa.

1 2	apple['20d-50d'] = apple['20d'] - apple['50d'] apple.tail()

	Open	High	Low	Close	Volume	Adj Close	20d	50d	200d	20d-50d
Date
2016-08-26	107.410004	107.949997	106.309998	106.940002	27766300	106.940002	107.87	101.51	102.73	6.36
2016-08-29	106.620003	107.440002	106.290001	106.820000	24970300	106.820000	107.91	101.74	102.68	6.17
2016-08-30	105.800003	106.500000	105.500000	106.000000	24863900	106.000000	107.98	101.96	102.63	6.02
2016-08-31	105.660004	106.570000	105.639999	106.099998	29662400	106.099998	108.00	102.16	102.60	5.84
2016-09-01	106.139999	106.800003	105.620003	106.730003	26643600	106.730003	108.04	102.39	102.56	5.65

We will refer to the sign of this difference as the regime; that is, if the fast moving average is above the slow moving average, this is a bullish regime (the bulls rule), and a bearish regime (the bears rule) holds when the fast moving average is below the slow moving average. I identify regimes with the following code.

# np.where() is a vectorized if-else function, where a condition is checked for each component of a vector, and the first argument passed is used when the condition holds, and the other passed if it does not
apple["Regime"] = np.where(apple['20d-50d'] > 0, 1, 0)
# We have 1's for bullish regimes and 0's for everything else. Below I replace bearish regimes's values with -1, and to maintain the rest of the vector, the second argument is apple["Regime"]
apple["Regime"] = np.where(apple['20d-50d'] < 0, -1, apple["Regime"])
apple.loc['2016-01-01':'2016-08-07',"Regime"].plot(ylim = (-2,2)).axhline(y = 0, color = "black", lw = 2)

1	apple["Regime"].plot(ylim = (-2,2)).axhline(y = 0, color = "black", lw = 2)

1	apple["Regime"].value_counts()

 1    966
-1    663
 0     50
Name: Regime, dtype: int64

The last line above indicates that for 1005 days the market was bearish on Apple, while for 600 days the market was bullish, and it was neutral for 54 days.

Trading signals appear at regime changes. When a bullish regime begins, a buy signal is triggered, and when it ends, a sell signal is triggered. Likewise, when a bearish regime begins, a sell signal is triggered, and when the regime ends, a buy signal is triggered (this is of interest only if you ever will short the stock, or use some derivative like a stock option to bet against the market).

It’s simple to obtain signals. Let $r_t$ indicate the regime at time $t$ , and $s_t$ the signal at time $t$ . Then:

$s_t = \text{sign}(r_t - r_{t - 1})$

$s_t \in \{-1, 0, 1\}$ , with $-1$ indicating “sell”, $1$ indicating “buy”, and $0$ no action. We can obtain signals like so:

# To ensure that all trades close out, I temporarily change the regime of the last row to 0
regime_orig = apple.ix[-1, "Regime"]
apple.ix[-1, "Regime"] = 0
apple["Signal"] = np.sign(apple["Regime"] - apple["Regime"].shift(1))
# Restore original regime data
apple.ix[-1, "Regime"] = regime_orig
apple.tail()

	Open	High	Low	Close	Volume	Adj Close	20d	50d	200d	20d-50d	Regime	Signal
Date
2016-08-26	107.410004	107.949997	106.309998	106.940002	27766300	106.940002	107.87	101.51	102.73	6.36	1.0	0.0
2016-08-29	106.620003	107.440002	106.290001	106.820000	24970300	106.820000	107.91	101.74	102.68	6.17	1.0	0.0
2016-08-30	105.800003	106.500000	105.500000	106.000000	24863900	106.000000	107.98	101.96	102.63	6.02	1.0	0.0
2016-08-31	105.660004	106.570000	105.639999	106.099998	29662400	106.099998	108.00	102.16	102.60	5.84	1.0	0.0
2016-09-01	106.139999	106.800003	105.620003	106.730003	26643600	106.730003	108.04	102.39	102.56	5.65	1.0	-1.0

1	apple["Signal"].plot(ylim = (-2, 2))

1	apple["Signal"].value_counts()

 0.0    1637
-1.0      21
 1.0      20
Name: Signal, dtype: int64

We would buy Apple stock 23 times and sell Apple stock 23 times. If we only go long on Apple stock, only 23 trades will be engaged in over the 6-year period, while if we pivot from a long to a short position every time a long position is terminated, we would engage in 23 trades total. (Bear in mind that trading more frequently isn’t necessarily good; trades are never free.)

You may notice that the system as it currently stands isn’t very robust, since even a fleeting moment when the fast moving average is above the slow moving average triggers a trade, resulting in trades that end immediately (which is bad if not simply because realistically every trade is accompanied by a fee that can quickly erode earnings). Additionally, every bullish regime immediately transitions into a bearish regime, and if you were constructing trading systems that allow both bullish and bearish bets, this would lead to the end of one trade immediately triggering a new trade that bets on the market in the opposite direction, which again seems finnicky. A better system would require more evidence that the market is moving in some particular direction. But we will not concern ourselves with these details for now.

Let’s now try to identify what the prices of the stock is at every buy and every sell.

apple.loc[apple["Signal"] == 1, "Close"]


Date
2010-03-16    224.449997
2010-06-18    274.070011
2010-09-20    283.230007
2011-05-12    346.569988
2011-07-14    357.770004
2011-12-28    402.640003
2012-06-25    570.770020
2013-05-17    433.260010
2013-07-31    452.529984
2013-10-16    501.110001
2014-03-26    539.779991
2014-04-25    571.939980
2014-08-18     99.160004
2014-10-28    106.739998
2015-02-05    119.940002
2015-04-28    130.559998
2015-10-27    114.550003
2016-03-11    102.260002
2016-07-01     95.889999
2016-07-25     97.339996
Name: Close, dtype: float64

apple.loc[apple["Signal"] == -1, "Close"]


Date
2010-06-11    253.509995
2010-07-22    259.020000
2011-03-30    348.630009
2011-03-31    348.510006
2011-05-27    337.409992
2011-11-17    377.410000
2012-05-09    569.180023
2012-10-17    644.610001
2013-06-26    398.069992
2013-10-03    483.409996
2014-01-28    506.499977
2014-04-22    531.700020
2014-06-11     93.860001
2014-10-17     97.669998
2015-01-05    106.250000
2015-04-16    126.169998
2015-06-25    127.500000
2015-12-18    106.029999
2016-05-05     93.239998
2016-07-08     96.680000
2016-09-01    106.730003
Name: Close, dtype: float64

# Create a DataFrame with trades, including the price at the trade and the regime under which the trade is made.
apple_signals = pd.concat([
        pd.DataFrame({"Price": apple.loc[apple["Signal"] == 1, "Close"],
                     "Regime": apple.loc[apple["Signal"] == 1, "Regime"],
                     "Signal": "Buy"}),
        pd.DataFrame({"Price": apple.loc[apple["Signal"] == -1, "Close"],
                     "Regime": apple.loc[apple["Signal"] == -1, "Regime"],
                     "Signal": "Sell"}),
    ])
apple_signals.sort_index(inplace = True)
apple_signals

	Price	Regime	Signal
Date
2010-03-16	224.449997	1.0	Buy
2010-06-11	253.509995	-1.0	Sell
2010-06-18	274.070011	1.0	Buy
2010-07-22	259.020000	-1.0	Sell
2010-09-20	283.230007	1.0	Buy
2011-03-30	348.630009	0.0	Sell
2011-03-31	348.510006	-1.0	Sell
2011-05-12	346.569988	1.0	Buy
2011-05-27	337.409992	-1.0	Sell
2011-07-14	357.770004	1.0	Buy
2011-11-17	377.410000	-1.0	Sell
2011-12-28	402.640003	1.0	Buy
2012-05-09	569.180023	-1.0	Sell
2012-06-25	570.770020	1.0	Buy
2012-10-17	644.610001	-1.0	Sell
2013-05-17	433.260010	1.0	Buy
2013-06-26	398.069992	-1.0	Sell
2013-07-31	452.529984	1.0	Buy
2013-10-03	483.409996	-1.0	Sell
2013-10-16	501.110001	1.0	Buy
2014-01-28	506.499977	-1.0	Sell
2014-03-26	539.779991	1.0	Buy
2014-04-22	531.700020	-1.0	Sell
2014-04-25	571.939980	1.0	Buy
2014-06-11	93.860001	-1.0	Sell
2014-08-18	99.160004	1.0	Buy
2014-10-17	97.669998	-1.0	Sell
2014-10-28	106.739998	1.0	Buy
2015-01-05	106.250000	-1.0	Sell
2015-02-05	119.940002	1.0	Buy
2015-04-16	126.169998	-1.0	Sell
2015-04-28	130.559998	1.0	Buy
2015-06-25	127.500000	-1.0	Sell
2015-10-27	114.550003	1.0	Buy
2015-12-18	106.029999	-1.0	Sell
2016-03-11	102.260002	1.0	Buy
2016-05-05	93.239998	-1.0	Sell
2016-07-01	95.889999	1.0	Buy
2016-07-08	96.680000	-1.0	Sell
2016-07-25	97.339996	1.0	Buy
2016-09-01	106.730003	1.0	Sell

# Let's see the profitability of long trades
apple_long_profits = pd.DataFrame({
        "Price": apple_signals.loc[(apple_signals["Signal"] == "Buy") &
                                  apple_signals["Regime"] == 1, "Price"],
        "Profit": pd.Series(apple_signals["Price"] - apple_signals["Price"].shift(1)).loc[
            apple_signals.loc[(apple_signals["Signal"].shift(1) == "Buy") & (apple_signals["Regime"].shift(1) == 1)].index
        ].tolist(),
        "End Date": apple_signals["Price"].loc[
            apple_signals.loc[(apple_signals["Signal"].shift(1) == "Buy") & (apple_signals["Regime"].shift(1) == 1)].index
        ].index
    })
apple_long_profits

	End Date	Price	Profit
Date
2010-03-16	2010-06-11	224.449997	29.059998
2010-06-18	2010-07-22	274.070011	-15.050011
2010-09-20	2011-03-30	283.230007	65.400002
2011-05-12	2011-05-27	346.569988	-9.159996
2011-07-14	2011-11-17	357.770004	19.639996
2011-12-28	2012-05-09	402.640003	166.540020
2012-06-25	2012-10-17	570.770020	73.839981
2013-05-17	2013-06-26	433.260010	-35.190018
2013-07-31	2013-10-03	452.529984	30.880012
2013-10-16	2014-01-28	501.110001	5.389976
2014-03-26	2014-04-22	539.779991	-8.079971
2014-04-25	2014-06-11	571.939980	-478.079979
2014-08-18	2014-10-17	99.160004	-1.490006
2014-10-28	2015-01-05	106.739998	-0.489998
2015-02-05	2015-04-16	119.940002	6.229996
2015-04-28	2015-06-25	130.559998	-3.059998
2015-10-27	2015-12-18	114.550003	-8.520004
2016-03-11	2016-05-05	102.260002	-9.020004
2016-07-01	2016-07-08	95.889999	0.790001
2016-07-25	2016-09-01	97.339996	9.390007

Above, we can see that on May 17th, 2013, there was a massive drop in the price of Apple stock, and it looks like our trading system would do badly. But this price drop is not because of a massive shock to Apple, but simply due to a stock split. And while dividend payments are not as obvious as a stock split, they may be affecting the performance of our system.

1 2	# Let's see the result over the whole period for which we have Apple data pandas_candlestick_ohlc(apple, stick = 45, otherseries = ["20d", "50d", "200d"])

We don’t want our trading system to be behaving poorly because of stock splits and dividend payments. How should we handle this? One approach would be to obtain historical stock split and dividend payment data and design a trading system for handling these. This would most realistically represent the behavior of the stock and could be considered the best solution, but it is more complicated. Another solution would be to adjust the prices to account for stock splits and dividend payments.

Yahoo! Finance only provides the adjusted closing price of a stock, but this is all we need to get adjusted opening, high, and low prices. The adjusted close is computed like so:

$\text{price}^{\text{adj}}_t = m_t \times \text{price}_t$

where $m_t$ is the multiplier used for the adjustment. Solving for $m_t$ requires only division and thus we can use the closing price and the adjusted closing price to adjust all prices in the series.

Let’s go back, adjust the apple data, and reevaluate our trading system using the adjusted data.

def ohlc_adj(dat):
    """
    :param dat: pandas DataFrame with stock data, including "Open", "High", "Low", "Close", and "Adj Close", with "Adj Close" containing adjusted closing prices
 
    :return: pandas DataFrame with adjusted stock data
 
    This function adjusts stock data for splits, dividends, etc., returning a data frame with
    "Open", "High", "Low" and "Close" columns. The input DataFrame is similar to that returned
    by pandas Yahoo! Finance API.
    """
    return pd.DataFrame({"Open": dat["Open"] * dat["Adj Close"] / dat["Close"],
                       "High": dat["High"] * dat["Adj Close"] / dat["Close"],
                       "Low": dat["Low"] * dat["Adj Close"] / dat["Close"],
                       "Close": dat["Adj Close"]})
 
apple_adj = ohlc_adj(apple)
 
# This next code repeats all the earlier analysis we did on the adjusted data
 
apple_adj["20d"] = np.round(apple_adj["Close"].rolling(window = 20, center = False).mean(), 2)
apple_adj["50d"] = np.round(apple_adj["Close"].rolling(window = 50, center = False).mean(), 2)
apple_adj["200d"] = np.round(apple_adj["Close"].rolling(window = 200, center = False).mean(), 2)
 
apple_adj['20d-50d'] = apple_adj['20d'] - apple_adj['50d']
# np.where() is a vectorized if-else function, where a condition is checked for each component of a vector, and the first argument passed is used when the condition holds, and the other passed if it does not
apple_adj["Regime"] = np.where(apple_adj['20d-50d'] > 0, 1, 0)
# We have 1's for bullish regimes and 0's for everything else. Below I replace bearish regimes's values with -1, and to maintain the rest of the vector, the second argument is apple["Regime"]
apple_adj["Regime"] = np.where(apple_adj['20d-50d'] < 0, -1, apple_adj["Regime"])
# To ensure that all trades close out, I temporarily change the regime of the last row to 0
regime_orig = apple_adj.ix[-1, "Regime"]
apple_adj.ix[-1, "Regime"] = 0
apple_adj["Signal"] = np.sign(apple_adj["Regime"] - apple_adj["Regime"].shift(1))
# Restore original regime data
apple_adj.ix[-1, "Regime"] = regime_orig
 
# Create a DataFrame with trades, including the price at the trade and the regime under which the trade is made.
apple_adj_signals = pd.concat([
        pd.DataFrame({"Price": apple_adj.loc[apple_adj["Signal"] == 1, "Close"],
                     "Regime": apple_adj.loc[apple_adj["Signal"] == 1, "Regime"],
                     "Signal": "Buy"}),
        pd.DataFrame({"Price": apple_adj.loc[apple_adj["Signal"] == -1, "Close"],
                     "Regime": apple_adj.loc[apple_adj["Signal"] == -1, "Regime"],
                     "Signal": "Sell"}),
    ])
apple_adj_signals.sort_index(inplace = True)
apple_adj_long_profits = pd.DataFrame({
        "Price": apple_adj_signals.loc[(apple_adj_signals["Signal"] == "Buy") &
                                  apple_adj_signals["Regime"] == 1, "Price"],
        "Profit": pd.Series(apple_adj_signals["Price"] - apple_adj_signals["Price"].shift(1)).loc[
            apple_adj_signals.loc[(apple_adj_signals["Signal"].shift(1) == "Buy") & (apple_adj_signals["Regime"].shift(1) == 1)].index
        ].tolist(),
        "End Date": apple_adj_signals["Price"].loc[
            apple_adj_signals.loc[(apple_adj_signals["Signal"].shift(1) == "Buy") & (apple_adj_signals["Regime"].shift(1) == 1)].index
        ].index
    })
 
pandas_candlestick_ohlc(apple_adj, stick = 45, otherseries = ["20d", "50d", "200d"])

1	apple_adj_long_profits

	End Date	Price	Profit
Date
2010-03-16	2010-06-10	29.355667	3.408371
2010-06-18	2010-07-22	35.845436	-1.968381
2010-09-20	2011-03-30	37.043466	8.553623
2011-05-12	2011-05-27	45.327660	-1.198030
2011-07-14	2011-11-17	46.792503	2.568702
2011-12-28	2012-05-09	52.661020	21.781659
2012-06-25	2012-10-17	74.650634	10.019459
2013-05-17	2013-06-26	57.882798	-4.701326
2013-07-31	2013-10-04	60.457234	4.500835
2013-10-16	2014-01-28	67.389473	1.122523
2014-03-11	2014-03-17	72.948554	-1.272298
2014-03-24	2014-04-22	73.370393	-1.019203
2014-04-25	2014-10-17	77.826851	16.191371
2014-10-28	2015-01-05	102.749105	-0.028185
2015-02-05	2015-04-16	116.413846	6.046838
2015-04-28	2015-06-26	126.721620	-3.184117
2015-10-27	2015-12-18	112.152083	-7.897288
2016-03-10	2016-05-05	100.015950	-7.278331
2016-06-23	2016-06-27	95.582210	-4.038123
2016-06-30	2016-07-11	95.084904	1.372569
2016-07-25	2016-09-01	96.815526	9.914477

As you can see, adjusting for dividends and stock splits makes a big difference. We will use this data from now on.

Let’s now create a simulated portfolio of $1,000,000, and see how it would behave, according to the rules we have established. This includes:

Investing only 10% of the portfolio in any trade
Exiting the position if losses exceed 20% of the value of the trade.

When simulating, bear in mind that:

Trades are done in batches of 100 stocks.
Our stop-loss rule involves placing an order to sell the stock the moment the price drops below the specified level. Thus we need to check whether the lows during this period ever go low enough to trigger the stop-loss. Realistically, unless we buy a put option, we cannot guarantee that we will sell the stock at the price we set at the stop-loss, but we will use this as the selling price anyway for the sake of simplicity.
Every trade is accompanied by a commission to the broker, which should be accounted for. I do not do so here.

Here’s how a backtest may look:

# We need to get the low of the price during each trade.
tradeperiods = pd.DataFrame({"Start": apple_adj_long_profits.index,
                            "End": apple_adj_long_profits["End Date"]})
apple_adj_long_profits["Low"] = tradeperiods.apply(lambda x: min(apple_adj.loc[x["Start"]:x["End"], "Low"]), axis = 1)
apple_adj_long_profits

	End Date	Price	Profit	Low
Date
2010-03-16	2010-06-10	29.355667	3.408371	26.059775
2010-06-18	2010-07-22	35.845436	-1.968381	31.337127
2010-09-20	2011-03-30	37.043466	8.553623	35.967068
2011-05-12	2011-05-27	45.327660	-1.198030	43.084626
2011-07-14	2011-11-17	46.792503	2.568702	46.171251
2011-12-28	2012-05-09	52.661020	21.781659	52.382438
2012-06-25	2012-10-17	74.650634	10.019459	73.975759
2013-05-17	2013-06-26	57.882798	-4.701326	52.859502
2013-07-31	2013-10-04	60.457234	4.500835	60.043080
2013-10-16	2014-01-28	67.389473	1.122523	67.136651
2014-03-11	2014-03-17	72.948554	-1.272298	71.167335
2014-03-24	2014-04-22	73.370393	-1.019203	69.579335
2014-04-25	2014-10-17	77.826851	16.191371	76.740971
2014-10-28	2015-01-05	102.749105	-0.028185	101.411076
2015-02-05	2015-04-16	116.413846	6.046838	114.948237
2015-04-28	2015-06-26	126.721620	-3.184117	119.733299
2015-10-27	2015-12-18	112.152083	-7.897288	104.038477
2016-03-10	2016-05-05	100.015950	-7.278331	91.345994
2016-06-23	2016-06-27	95.582210	-4.038123	91.006996
2016-06-30	2016-07-11	95.084904	1.372569	93.791913
2016-07-25	2016-09-01	96.815526	9.914477	95.900485

# Now we have all the information needed to simulate this strategy in apple_adj_long_profits
cash = 1000000
apple_backtest = pd.DataFrame({"Start Port. Value": [],
                         "End Port. Value": [],
                         "End Date": [],
                         "Shares": [],
                         "Share Price": [],
                         "Trade Value": [],
                         "Profit per Share": [],
                         "Total Profit": [],
                         "Stop-Loss Triggered": []})
port_value = .1  # Max proportion of portfolio bet on any trade
batch = 100      # Number of shares bought per batch
stoploss = .2    # % of trade loss that would trigger a stoploss
for index, row in apple_adj_long_profits.iterrows():
    batches = np.floor(cash * port_value) // np.ceil(batch * row["Price"]) # Maximum number of batches of stocks invested in
    trade_val = batches * batch * row["Price"] # How much money is put on the line with each trade
    if row["Low"] < (1 - stoploss) * row["Price"]:   # Account for the stop-loss
        share_profit = np.round((1 - stoploss) * row["Price"], 2)
        stop_trig = True
    else:
        share_profit = row["Profit"]
        stop_trig = False
    profit = share_profit * batches * batch # Compute profits
    # Add a row to the backtest data frame containing the results of the trade
    apple_backtest = apple_backtest.append(pd.DataFrame({
                "Start Port. Value": cash,
                "End Port. Value": cash + profit,
                "End Date": row["End Date"],
                "Shares": batch * batches,
                "Share Price": row["Price"],
                "Trade Value": trade_val,
                "Profit per Share": share_profit,
                "Total Profit": profit,
                "Stop-Loss Triggered": stop_trig
            }, index = [index]))
    cash = max(0, cash + profit)
 
apple_backtest

	End Date	End Port. Value	Profit per Share	Share Price	Shares	Start Port. Value	Total Profit	Trade Value
2010-03-16	2010-06-10	1.011588e+06	3.408371	29.355667	3400.0	1.000000e+06	11588.4614	99809.2678
2010-06-18	2010-07-22	1.006077e+06	-1.968381	35.845436	2800.0	1.011588e+06	-5511.4668	100367.2208
2010-09-20	2011-03-30	1.029172e+06	8.553623	37.043466	2700.0	1.006077e+06	23094.7821	100017.3582
2011-05-12	2011-05-27	1.026536e+06	-1.198030	45.327660	2200.0	1.029172e+06	-2635.6660	99720.8520
2011-07-14	2011-11-17	1.031930e+06	2.568702	46.792503	2100.0	1.026536e+06	5394.2742	98264.2563
2011-12-28	2012-05-09	1.073316e+06	21.781659	52.661020	1900.0	1.031930e+06	41385.1521	100055.9380
2012-06-25	2012-10-17	1.087343e+06	10.019459	74.650634	1400.0	1.073316e+06	14027.2426	104510.8876
2013-05-17	2013-06-26	1.078880e+06	-4.701326	57.882798	1800.0	1.087343e+06	-8462.3868	104189.0364
2013-07-31	2013-10-04	1.086532e+06	4.500835	60.457234	1700.0	1.078880e+06	7651.4195	102777.2978
2013-10-16	2014-01-28	1.088328e+06	1.122523	67.389473	1600.0	1.086532e+06	1796.0368	107823.1568
2014-03-11	2014-03-17	1.086547e+06	-1.272298	72.948554	1400.0	1.088328e+06	-1781.2172	102127.9756
2014-03-24	2014-04-22	1.085120e+06	-1.019203	73.370393	1400.0	1.086547e+06	-1426.8842	102718.5502
2014-04-25	2014-10-17	1.106169e+06	16.191371	77.826851	1300.0	1.085120e+06	21048.7823	101174.9063
2014-10-28	2015-01-05	1.106140e+06	-0.028185	102.749105	1000.0	1.106169e+06	-28.1850	102749.1050
2015-02-05	2015-04-16	1.111582e+06	6.046838	116.413846	900.0	1.106140e+06	5442.1542	104772.4614
2015-04-28	2015-06-26	1.109035e+06	-3.184117	126.721620	800.0	1.111582e+06	-2547.2936	101377.2960
2015-10-27	2015-12-18	1.101928e+06	-7.897288	112.152083	900.0	1.109035e+06	-7107.5592	100936.8747
2016-03-10	2016-05-05	1.093921e+06	-7.278331	100.015950	1100.0	1.101928e+06	-8006.1641	110017.5450
2016-06-23	2016-06-27	1.089480e+06	-4.038123	95.582210	1100.0	1.093921e+06	-4441.9353	105140.4310
2016-06-30	2016-07-11	1.090989e+06	1.372569	95.084904	1100.0	1.089480e+06	1509.8259	104593.3944
2016-07-25	2016-09-01	1.101895e+06	9.914477	96.815526	1100.0	1.090989e+06	10905.9247	106497.0786

1	apple_backtest["End Port. Value"].plot()

Our portfolio’s value grew by 10% in about six years. Considering that only 10% of the portfolio was ever involved in any single trade, this is not bad performance.

Notice that this strategy never lead to our stop-loss order being triggered. Does this mean we don’t need stop-loss orders? There is no simple answer to this. After all, if we had chosen a different level at which a stop-loss would be triggered, we may have seen it triggered.

Stop-loss orders are automatically triggered and ask no question as to why the order was triggered. This means that both a genuine change in trend or a momentary fluctuation can trigger a stop-loss, with the latter being the more concerning reason since not only do you have to pay for the order, there is no guarantee that you will sell the stock at the price you set, which could make your losses worse. Meanwhile, the trend on which you based your trade still holds, and had the stop-loss not been triggered, you may have made a profit. That said, a stop-loss can help you protect against your own emotions, staying wedded to a trade even though it has lost its value. They’re also good to have if you cannot monitor or quickly access your portfolio, like when you are on vacation.

I have provided links both for and “against” the use of stop-loss orders, but from now on I’m not going to require our backtesting system to account for them. While less realistic (and I do believe an industrial-strength system should account for a stop-loss rule), this simplifies the backtesting task.

A more realistic portfolio would not be betting 10% of its value on only one stock. A more realistic one would consider investing in multiple stocks. Multiple trades may be ongoing at any given time involving multiple companies, and most of the portfolio will be in stocks, not cash. Now that we will be investing in multiple stops and exiting only when moving averages cross (not because of a stop-loss), we will need to change our approach to backtesting. For example, we will be using one pandasDataFrame to contain all buy and sell orders for all stocks being considered, and our loop above will have to track more information.

I have written functions for creating order data for multiple stocks, and a function for performing the backtesting.

def ma_crossover_orders(stocks, fast, slow):
    """
    :param stocks: A list of tuples, the first argument in each tuple being a string containing the ticker symbol of each stock (or however you want the stock represented, so long as it's unique), and the second being a pandas DataFrame containing the stocks, with a "Close" column and indexing by date (like the data frames returned by the Yahoo! Finance API)
    :param fast: Integer for the number of days used in the fast moving average
    :param slow: Integer for the number of days used in the slow moving average
 
    :return: pandas DataFrame containing stock orders
 
    This function takes a list of stocks and determines when each stock would be bought or sold depending on a moving average crossover strategy, returning a data frame with information about when the stocks in the portfolio are bought or sold according to the strategy
    """
    fast_str = str(fast) + 'd'
    slow_str = str(slow) + 'd'
    ma_diff_str = fast_str + '-' + slow_str
 
    trades = pd.DataFrame({"Price": [], "Regime": [], "Signal": []})
    for s in stocks:
        # Get the moving averages, both fast and slow, along with the difference in the moving averages
        s[1][fast_str] = np.round(s[1]["Close"].rolling(window = fast, center = False).mean(), 2)
        s[1][slow_str] = np.round(s[1]["Close"].rolling(window = slow, center = False).mean(), 2)
        s[1][ma_diff_str] = s[1][fast_str] - s[1][slow_str]
 
        # np.where() is a vectorized if-else function, where a condition is checked for each component of a vector, and the first argument passed is used when the condition holds, and the other passed if it does not
        s[1]["Regime"] = np.where(s[1][ma_diff_str] > 0, 1, 0)
        # We have 1's for bullish regimes and 0's for everything else. Below I replace bearish regimes's values with -1, and to maintain the rest of the vector, the second argument is apple["Regime"]
        s[1]["Regime"] = np.where(s[1][ma_diff_str] < 0, -1, s[1]["Regime"])
        # To ensure that all trades close out, I temporarily change the regime of the last row to 0
        regime_orig = s[1].ix[-1, "Regime"]
        s[1].ix[-1, "Regime"] = 0
        s[1]["Signal"] = np.sign(s[1]["Regime"] - s[1]["Regime"].shift(1))
        # Restore original regime data
        s[1].ix[-1, "Regime"] = regime_orig
 
        # Get signals
        signals = pd.concat([
            pd.DataFrame({"Price": s[1].loc[s[1]["Signal"] == 1, "Close"],
                         "Regime": s[1].loc[s[1]["Signal"] == 1, "Regime"],
                         "Signal": "Buy"}),
            pd.DataFrame({"Price": s[1].loc[s[1]["Signal"] == -1, "Close"],
                         "Regime": s[1].loc[s[1]["Signal"] == -1, "Regime"],
                         "Signal": "Sell"}),
        ])
        signals.index = pd.MultiIndex.from_product([signals.index, [s[0]]], names = ["Date", "Symbol"])
        trades = trades.append(signals)
 
    trades.sort_index(inplace = True)
    trades.index = pd.MultiIndex.from_tuples(trades.index, names = ["Date", "Symbol"])
 
    return trades
 
 
def backtest(signals, cash, port_value = .1, batch = 100):
    """
    :param signals: pandas DataFrame containing buy and sell signals with stock prices and symbols, like that returned by ma_crossover_orders
    :param cash: integer for starting cash value
    :param port_value: maximum proportion of portfolio to risk on any single trade
    :param batch: Trading batch sizes
 
    :return: pandas DataFrame with backtesting results
 
    This function backtests strategies, with the signals generated by the strategies being passed in the signals DataFrame. A fictitious portfolio is simulated and the returns generated by this portfolio are reported.
    """
 
    SYMBOL = 1 # Constant for which element in index represents symbol
    portfolio = dict()    # Will contain how many stocks are in the portfolio for a given symbol
    port_prices = dict()  # Tracks old trade prices for determining profits
    # Dataframe that will contain backtesting report
    results = pd.DataFrame({"Start Cash": [],
                            "End Cash": [],
                            "Portfolio Value": [],
                            "Type": [],
                            "Shares": [],
                            "Share Price": [],
                            "Trade Value": [],
                            "Profit per Share": [],
                            "Total Profit": []})
 
    for index, row in signals.iterrows():
        # These first few lines are done for any trade
        shares = portfolio.setdefault(index[SYMBOL], 0)
        trade_val = 0
        batches = 0
        cash_change = row["Price"] * shares   # Shares could potentially be a positive or negative number (cash_change will be added in the end; negative shares indicate a short)
        portfolio[index[SYMBOL]] = 0  # For a given symbol, a position is effectively cleared
 
        old_price = port_prices.setdefault(index[SYMBOL], row["Price"])
        portfolio_val = 0
        for key, val in portfolio.items():
            portfolio_val += val * port_prices[key]
 
        if row["Signal"] == "Buy" and row["Regime"] == 1:  # Entering a long position
            batches = np.floor((portfolio_val + cash) * port_value) // np.ceil(batch * row["Price"]) # Maximum number of batches of stocks invested in
            trade_val = batches * batch * row["Price"] # How much money is put on the line with each trade
            cash_change -= trade_val  # We are buying shares so cash will go down
            portfolio[index[SYMBOL]] = batches * batch  # Recording how many shares are currently invested in the stock
            port_prices[index[SYMBOL]] = row["Price"]   # Record price
            old_price = row["Price"]
        elif row["Signal"] == "Sell" and row["Regime"] == -1: # Entering a short
            pass
            # Do nothing; can we provide a method for shorting the market?
        #else:
            #raise ValueError("I don't know what to do with signal " + row["Signal"])
 
        pprofit = row["Price"] - old_price   # Compute profit per share; old_price is set in such a way that entering a position results in a profit of zero
 
        # Update report
        results = results.append(pd.DataFrame({
                "Start Cash": cash,
                "End Cash": cash + cash_change,
                "Portfolio Value": cash + cash_change + portfolio_val + trade_val,
                "Type": row["Signal"],
                "Shares": batch * batches,
                "Share Price": row["Price"],
                "Trade Value": abs(cash_change),
                "Profit per Share": pprofit,
                "Total Profit": batches * batch * pprofit
            }, index = [index]))
        cash += cash_change  # Final change to cash balance
 
    results.sort_index(inplace = True)
    results.index = pd.MultiIndex.from_tuples(results.index, names = ["Date", "Symbol"])
 
    return results
 
# Get more stocks
microsoft = web.DataReader("MSFT", "yahoo", start, end)
google = web.DataReader("GOOG", "yahoo", start, end)
facebook = web.DataReader("FB", "yahoo", start, end)
twitter = web.DataReader("TWTR", "yahoo", start, end)
netflix = web.DataReader("NFLX", "yahoo", start, end)
amazon = web.DataReader("AMZN", "yahoo", start, end)
yahoo = web.DataReader("YHOO", "yahoo", start, end)
sony = web.DataReader("SNY", "yahoo", start, end)
nintendo = web.DataReader("NTDOY", "yahoo", start, end)
ibm = web.DataReader("IBM", "yahoo", start, end)
hp = web.DataReader("HPQ", "yahoo", start, end)

signals = ma_crossover_orders([("AAPL", ohlc_adj(apple)),
                              ("MSFT",  ohlc_adj(microsoft)),
                              ("GOOG",  ohlc_adj(google)),
                              ("FB",    ohlc_adj(facebook)),
                              ("TWTR",  ohlc_adj(twitter)),
                              ("NFLX",  ohlc_adj(netflix)),
                              ("AMZN",  ohlc_adj(amazon)),
                              ("YHOO",  ohlc_adj(yahoo)),
                              ("SNY",   ohlc_adj(yahoo)),
                              ("NTDOY", ohlc_adj(nintendo)),
                              ("IBM",   ohlc_adj(ibm)),
                              ("HPQ",   ohlc_adj(hp))],
                            fast = 20, slow = 50)
signals

		Price	Regime	Signal
Date	Symbol
2010-03-16	AAPL	29.355667	1.0	Buy
AMZN	131.789993	1.0	Buy
GOOG	282.318173	-1.0	Sell
HPQ	20.722316	1.0	Buy
IBM	110.563240	1.0	Buy
MSFT	24.677580	-1.0	Sell
NFLX	10.090000	1.0	Buy
NTDOY	37.099998	1.0	Buy
SNY	16.360001	-1.0	Sell
YHOO	16.360001	-1.0	Sell
2010-03-17	SNY	16.500000	1.0	Buy
YHOO	16.500000	1.0	Buy
2010-03-22	GOOG	278.472004	1.0	Buy
2010-03-23	MSFT	25.106096	1.0	Buy
2010-05-03	GOOG	265.035411	-1.0	Sell
2010-05-10	HPQ	19.435830	-1.0	Sell
2010-05-14	NTDOY	35.799999	-1.0	Sell
2010-05-17	SNY	16.270000	-1.0	Sell
YHOO	16.270000	-1.0	Sell
2010-05-19	AMZN	124.589996	-1.0	Sell
MSFT	23.835187	-1.0	Sell
2010-05-21	IBM	108.322991	-1.0	Sell
2010-06-10	AAPL	32.764038	0.0	Sell
2010-06-11	AAPL	33.156405	-1.0	Sell
2010-06-18	AAPL	35.845436	1.0	Buy
2010-06-28	IBM	111.397697	1.0	Buy
2010-07-01	IBM	105.861499	-1.0	Sell
2010-07-06	IBM	106.630175	1.0	Buy
2010-07-09	NTDOY	36.950001	1.0	Buy
2010-07-20	IBM	109.298956	-1.0	Sell
…	…	…	…	…
2016-06-23	AAPL	95.582210	1.0	Buy
TWTR	17.040001	1.0	Buy
2016-06-27	AAPL	91.544087	-1.0	Sell
FB	108.970001	-1.0	Sell
2016-06-28	SNY	36.040001	-1.0	Sell
YHOO	36.040001	-1.0	Sell
2016-06-30	AAPL	95.084904	1.0	Buy
NFLX	91.480003	0.0	Sell
2016-07-01	NFLX	96.669998	-1.0	Sell
SNY	37.990002	1.0	Buy
YHOO	37.990002	1.0	Buy
2016-07-11	AAPL	96.457473	-1.0	Sell
NTDOY	27.700001	1.0	Buy
2016-07-14	MSFT	53.407133	1.0	Buy
2016-07-25	AAPL	96.815526	1.0	Buy
FB	121.629997	1.0	Buy
2016-07-26	GOOG	738.419983	1.0	Buy
2016-08-18	NFLX	96.160004	1.0	Buy
2016-09-01	AAPL	106.730003	1.0	Sell
2016-09-02	AMZN	772.440002	1.0	Sell
FB	126.510002	1.0	Sell
GOOG	771.460022	1.0	Sell
HPQ	14.490000	1.0	Sell
IBM	159.550003	1.0	Sell
MSFT	57.669998	1.0	Sell
NFLX	97.379997	1.0	Sell
NTDOY	28.840000	1.0	Sell
SNY	43.279999	1.0	Sell
TWTR	19.549999	1.0	Sell
YHOO	43.279999	1.0	Sell

475 rows × 3 columns

1 2	bk = backtest(signals, 1000000) bk

		End Cash	Portfolio Value	Profit per Share	Share Price	Shares	Start Cash	Total Profit	Trade Value	Type
Date	Symbol
2010-03-16	AAPL	9.001907e+05	1.000000e+06	0.000000	29.355667	3400.0	1.000000e+06	0.0	99809.2678	Buy
AMZN	8.079377e+05	1.000000e+06	0.000000	131.789993	700.0	9.001907e+05	0.0	92252.9951	Buy
GOOG	8.079377e+05	1.000000e+06	0.000000	282.318173	0.0	8.079377e+05	0.0	0.0000	Sell
HPQ	7.084706e+05	1.000000e+06	0.000000	20.722316	4800.0	8.079377e+05	0.0	99467.1168	Buy
IBM	6.089637e+05	1.000000e+06	0.000000	110.563240	900.0	7.084706e+05	0.0	99506.9160	Buy
MSFT	6.089637e+05	1.000000e+06	0.000000	24.677580	0.0	6.089637e+05	0.0	0.0000	Sell
NFLX	5.090727e+05	1.000000e+06	0.000000	10.090000	9900.0	6.089637e+05	0.0	99891.0000	Buy
NTDOY	4.126127e+05	1.000000e+06	0.000000	37.099998	2600.0	5.090727e+05	0.0	96459.9948	Buy
SNY	4.126127e+05	1.000000e+06	0.000000	16.360001	0.0	4.126127e+05	0.0	0.0000	Sell
YHOO	4.126127e+05	1.000000e+06	0.000000	16.360001	0.0	4.126127e+05	0.0	0.0000	Sell
2010-03-17	SNY	3.136127e+05	1.000000e+06	0.000000	16.500000	6000.0	4.126127e+05	0.0	99000.0000	Buy
YHOO	2.146127e+05	1.000000e+06	0.000000	16.500000	6000.0	3.136127e+05	0.0	99000.0000	Buy
2010-03-22	GOOG	1.310711e+05	1.000000e+06	0.000000	278.472004	300.0	2.146127e+05	0.0	83541.6012	Buy
2010-03-23	MSFT	3.315733e+04	1.000000e+06	0.000000	25.106096	3900.0	1.310711e+05	0.0	97913.7744	Buy
2010-05-03	GOOG	1.126680e+05	9.959690e+05	-13.436593	265.035411	0.0	3.315733e+04	-0.0	79510.6233	Sell
2010-05-10	HPQ	2.059599e+05	9.897939e+05	-1.286486	19.435830	0.0	1.126680e+05	-0.0	93291.9840	Sell
2010-05-14	NTDOY	2.990399e+05	9.864139e+05	-1.299999	35.799999	0.0	2.059599e+05	-0.0	93079.9974	Sell
2010-05-17	SNY	3.966599e+05	9.850339e+05	-0.230000	16.270000	0.0	2.990399e+05	-0.0	97620.0000	Sell
YHOO	4.942799e+05	9.836539e+05	-0.230000	16.270000	0.0	3.966599e+05	-0.0	97620.0000	Sell
2010-05-19	AMZN	5.814929e+05	9.786139e+05	-7.199997	124.589996	0.0	4.942799e+05	-0.0	87212.9972	Sell
MSFT	6.744502e+05	9.736573e+05	-1.270909	23.835187	0.0	5.814929e+05	-0.0	92957.2293	Sell
2010-05-21	IBM	7.719409e+05	9.716411e+05	-2.240249	108.322991	0.0	6.744502e+05	-0.0	97490.6919	Sell
2010-06-10	AAPL	8.833386e+05	9.832296e+05	3.408371	32.764038	0.0	7.719409e+05	0.0	111397.7292	Sell
2010-06-11	AAPL	8.833386e+05	9.832296e+05	3.800738	33.156405	0.0	8.833386e+05	0.0	0.0000	Sell
2010-06-18	AAPL	7.865559e+05	9.832296e+05	0.000000	35.845436	2700.0	8.833386e+05	0.0	96782.6772	Buy
2010-06-28	IBM	6.974378e+05	9.832296e+05	0.000000	111.397697	800.0	7.865559e+05	0.0	89118.1576	Buy
2010-07-01	IBM	7.821270e+05	9.788006e+05	-5.536198	105.861499	0.0	6.974378e+05	-0.0	84689.1992	Sell
2010-07-06	IBM	6.861598e+05	9.788006e+05	0.000000	106.630175	900.0	7.821270e+05	0.0	95967.1575	Buy
2010-07-09	NTDOY	5.900898e+05	9.788006e+05	0.000000	36.950001	2600.0	6.861598e+05	0.0	96070.0026	Buy
2010-07-20	IBM	6.884589e+05	9.812025e+05	2.668781	109.298956	0.0	5.900898e+05	0.0	98369.0604	Sell
…	…	…	…	…	…	…	…	…	…	…
2016-06-23	AAPL	3.951693e+05	1.863808e+06	0.000000	95.582210	1900.0	5.767755e+05	0.0	181606.1990	Buy
TWTR	2.094333e+05	1.863808e+06	0.000000	17.040001	10900.0	3.951693e+05	0.0	185736.0109	Buy
2016-06-27	AAPL	3.833670e+05	1.856135e+06	-4.038123	91.544087	0.0	2.094333e+05	-0.0	173933.7653	Sell
FB	5.795130e+05	1.862921e+06	3.770004	108.970001	0.0	3.833670e+05	0.0	196146.0018	Sell
2016-06-28	SNY	7.885450e+05	1.880959e+06	3.110001	36.040001	0.0	5.795130e+05	0.0	209032.0058	Sell
YHOO	9.975770e+05	1.898997e+06	3.110001	36.040001	0.0	7.885450e+05	0.0	209032.0058	Sell
2016-06-30	AAPL	8.169157e+05	1.898997e+06	0.000000	95.084904	1900.0	9.975770e+05	0.0	180661.3176	Buy
NFLX	9.907277e+05	1.893981e+06	-2.640000	91.480003	0.0	8.169157e+05	-0.0	173812.0057	Sell
2016-07-01	NFLX	9.907277e+05	1.893981e+06	2.549995	96.669998	0.0	9.907277e+05	0.0	0.0000	Sell
SNY	8.045767e+05	1.893981e+06	0.000000	37.990002	4900.0	9.907277e+05	0.0	186151.0098	Buy
YHOO	6.184257e+05	1.893981e+06	0.000000	37.990002	4900.0	8.045767e+05	0.0	186151.0098	Buy
2016-07-11	AAPL	8.016949e+05	1.896589e+06	1.372569	96.457473	0.0	6.184257e+05	0.0	183269.1987	Sell
NTDOY	6.133349e+05	1.896589e+06	0.000000	27.700001	6800.0	8.016949e+05	0.0	188360.0068	Buy
2016-07-14	MSFT	4.264099e+05	1.896589e+06	0.000000	53.407133	3500.0	6.133349e+05	0.0	186924.9655	Buy
2016-07-25	AAPL	2.424604e+05	1.896589e+06	0.000000	96.815526	1900.0	4.264099e+05	0.0	183949.4994	Buy
FB	6.001543e+04	1.896589e+06	0.000000	121.629997	1500.0	2.424604e+05	0.0	182444.9955	Buy
2016-07-26	GOOG	-8.766857e+04	1.896589e+06	0.000000	738.419983	200.0	6.001543e+04	0.0	147683.9966	Buy
2016-08-18	NFLX	-2.703726e+05	1.896589e+06	0.000000	96.160004	1900.0	-8.766857e+04	0.0	182704.0076	Buy
2016-09-01	AAPL	-6.758557e+04	1.915427e+06	9.914477	106.730003	0.0	-2.703726e+05	0.0	202787.0057	Sell
2016-09-02	AMZN	1.641464e+05	1.979327e+06	213.000000	772.440002	0.0	-6.758557e+04	0.0	231732.0006	Sell
FB	3.539114e+05	1.986647e+06	4.880005	126.510002	0.0	1.641464e+05	0.0	189765.0030	Sell
GOOG	5.082034e+05	1.993255e+06	33.040039	771.460022	0.0	3.539114e+05	0.0	154292.0044	Sell
HPQ	7.081654e+05	2.006030e+06	0.925746	14.490000	0.0	5.082034e+05	0.0	199962.0000	Sell
IBM	8.996254e+05	2.015652e+06	8.018727	159.550003	0.0	7.081654e+05	0.0	191460.0036	Sell
MSFT	1.101470e+06	2.030572e+06	4.262865	57.669998	0.0	8.996254e+05	0.0	201844.9930	Sell
NFLX	1.286492e+06	2.032890e+06	1.219993	97.379997	0.0	1.101470e+06	0.0	185021.9943	Sell
NTDOY	1.482604e+06	2.040642e+06	1.139999	28.840000	0.0	1.286492e+06	0.0	196112.0000	Sell
SNY	1.694676e+06	2.066563e+06	5.289997	43.279999	0.0	1.482604e+06	0.0	212071.9951	Sell
TWTR	1.907771e+06	2.093922e+06	2.509998	19.549999	0.0	1.694676e+06	0.0	213094.9891	Sell
YHOO	2.119843e+06	2.119843e+06	5.289997	43.279999	0.0	1.907771e+06	0.0	212071.9951	Sell

475 rows × 9 columns

1	bk["Portfolio Value"].groupby(level = 0).apply(lambda x: x[-1]).plot()

A more realistic portfolio that can invest in any in a list of twelve (tech) stocks has a final growth of about 100%. How good is this? While on the surface not bad, we will see we could have done better.

Benchmarking

Backtesting is only part of evaluating the efficacy of a trading strategy. We would like to benchmark the strategy, or compare it to other available (usually well-known) strategies in order to determine how well we have done.

Whenever you evaluate a trading system, there is one strategy that you should always check, one that beats all but a handful of managed mutual funds and investment managers: buy and hold SPY. The efficient market hypothesis claims that it is all but impossible for anyone to beat the market. Thus, one should always buy an index fund that merely reflects the composition of the market. SPY is an exchange-traded fund (a mutual fund that is traded on the market like a stock) whose value effectively represents the value of the stocks in the S&P 500 stock index. By buying and holding SPY, we are effectively trying to match our returns with the market rather than beat it.

I obtain data on SPY below, and look at the profits for simply buying and holding SPY.

1 2	spyder = web.DataReader("SPY", "yahoo", start, end) spyder.iloc[[0,-1],:]

	Open	High	Low	Close	Volume	Adj Close
Date
2010-01-04	112.370003	113.389999	111.510002	113.330002	118944600	99.292299
2016-09-01	217.369995	217.729996	216.029999	217.389999	93859000	217.389999

batches = 1000000 // np.ceil(100 * spyder.ix[0,"Adj Close"]) # Maximum number of batches of stocks invested in
trade_val = batches * batch * spyder.ix[0,"Adj Close"] # How much money is used to buy SPY
final_val = batches * batch * spyder.ix[-1,"Adj Close"] + (1000000 - trade_val) # Final value of the portfolio
final_val


2180977.0

# We see that the buy-and-hold strategy beats the strategy we developed earlier. I would also like to see a plot.
ax_bench = (spyder["Adj Close"] / spyder.ix[0, "Adj Close"]).plot(label = "SPY")
ax_bench = (bk["Portfolio Value"].groupby(level = 0).apply(lambda x: x[-1]) / 1000000).plot(ax = ax_bench, label = "Portfolio")
ax_bench.legend(ax_bench.get_lines(), [l.get_label() for l in ax_bench.get_lines()], loc = 'best')
ax_bench

Buying and holding SPY beats our trading system, at least how we currently set it up, and we haven’t even accounted for how expensive our more complex strategy is in terms of fees. Given both the opportunity cost and the expense associated with the active strategy, we should not use it.

What could we do to improve the performance of our system? For starters, we could try diversifying. All the stocks we considered were tech companies, which means that if the tech industry is doing poorly, our portfolio will reflect that. We could try developing a system that can also short stocks or bet bearishly, so we can take advantage of movement in any direction. We could seek means for forecasting how high we expect a stock to move. Whatever we do, though, must beat this benchmark; otherwise there is an opportunity cost associated with our trading system.

Other benchmark strategies exist, and if our trading system beat the “buy and hold SPY” strategy, we may check against them. Some such strategies include:

Buy SPY when its closing monthly price is aboves its ten-month moving average.
Buy SPY when its ten-month momentum is positive. (Momentum is the first difference of a moving average process, or $MO^q_t = MA^q_t - MA^q_{t - 1}$ .)

(I first read of these strategies here.) The general lesson still holds: don’t use a complex trading system with lots of active trading when a simple strategy involving an index fund without frequent trading beats it. This is actually a very difficult requirement to meet.

As a final note, suppose that your trading system did manage to beat any baseline strategy thrown at it in backtesting. Does backtesting predict future performance? Not at all. Backtesting has a propensity for overfitting, so just because backtesting predicts high growth doesn’t mean that growth will hold in the future.

Conclusion

While this lecture ends on a depressing note, keep in mind that the efficient market hypothesis has many critics. My own opinion is that as trading becomes more algorithmic, beating the market will become more difficult. That said, it may be possible to beat the market, even though mutual funds seem incapable of doing so (bear in mind, though, that part of the reason mutual funds perform so poorly is because of fees, which is not a concern for index funds).

This lecture is very brief, covering only one type of strategy: strategies based on moving averages. Many other trading signals exist and employed. Additionally, we never discussed in depth shorting stocks, currency trading, or stock options. Stock options, in particular, are a rich subject that offer many different ways to bet on the direction of a stock. You can read more about derivatives (including stock options and other derivatives) in the bookDerivatives Analytics with Python: Data Analysis, Models, Simulation, Calibration and Hedging, which is available from the University of Utah library (for University of Utah students).

Another resource (which I used as a reference while writing this lecture) is the O’Reilly book Python for Finance, also available from the University of Utah library.

Remember that it is possible (if not common) to lose money in the stock market. It’s also true, though, that it’s difficult to find returns like those found in stocks, and any investment strategy should take investing in it seriously. This lecture is intended to provide a starting point for evaluating stock trading and investment, and I hope you continue to explore these ideas.

Problems

Problem 1

Devise a trading strategy as described in lecture based on moving-average crossovers (you do not need a stop-loss). Pick a list of at least 15 stocks that have existed since January 1st, 2010. Backtest your strategy with the stocks chosen and benchmark the performance of your portfolio against the performance of SPY. Are you able to beat the market?

Problem 2

Realistically, with every trade a commission is applied. Read about howcommission works, and modify the backtest() function in the lecture to allow multiple commission structures (flat fee, percentage of portfolio, etc.) to be simulated.

Additionally, our current moving average crossover strategy results in a trading signal triggering the moment two moving averages cross. We would like to make sure signals are more robust, either by:

Triggering a trade when the moving averages differ by a fixed amount
Triggering a trade when the moving averages differ by some amount of **(rolling) standard deviations**, which are defined by:

$SD^n_t = \sqrt{\frac{1}{n - 1} \sum_{i = 0}^{n - 1} (x_{t - 1} - MA^n_t)^2}$

(**pandas** does have means for computing rolling standard deviations.) Regarding the latter, if the moving averages differ by $p \times SD^n_t$ , a trading signal is sent. Modify the function ma_crossover_orders() so that these restrictions can be implemented. Specifically, you should have the ability to set how many days are in the window of the rolling standard deviation (it need not be the same as either the fast or slow moving average windows), and how many standard deviations the moving averages must differ by in order for a signal to be sent. (The current behavior of these functions should still be possible; in fact, it should be the default behavior.)

Once these changes have been made, repeat problem 1, including a realistic commission scheme (consider looking up one from a brokerage firm) when simulating the performance of the portfolio, and requiring the moving averages differ by some fixed number or standard deviations in order for signals to be sent.

Problem 3

We did not set up our trading system to allow for shorting stocks. Short selling is much trickier, since losses from short selling are unlimited (a long position, on the other hand, limits losses to the total value of the assets purchased). Read about short selling here. Then modify the functionbacktest() to allow for short selling. How will the function decide how to conduct short sales, including how many shares to short and how to account for shorted stocks when conducting other trades? We leave this up to you to decide. As a hint, the number of shares being shorted can be represented internally in the function by a negative number.

Once this is done, repeat Problem 1, perhaps also using features implemented in Problem 2.

Origin Part I

Origin Part II