data science tutorials and snippets prepared by greysweater42
There are two main reasons for this project:
I’ve recently deep dived into pytorch and I want to test in on a real task. I’ve been using pytorch extensively at work, but I usually didn’t have enough time to study its quirks as thouroughly as I wanted.
As I was studying Quantitative Methods in Economy, my all-time dream was to predict fluctuations on the stock market, i.e. come up with a legal money factory. During the studies I realized that this isn’t possible, but I still want to give it a try ;)
a good understanding of LSTM nets
proficiency in pytorch
an end-to-end project in pytorch for future reference
any money ;)
I need to collect the data. The best source for me will be Warsaw Stock Exchange, as I am Polish, so hopefully I’ll have some intuition about the stock I buy/sell.
First I will concentrate on regular stock market, later on maybe I’ll delve into some more exotic instruments, like derivatives.
What if there are many irrational players on the market, so my algorithm could outsmart them?
I’ve been searching for APIs to Warsaw Stock Exchange for a while, and I convinced myself that maybe I should try something more user-friendly for a start. I found this article which propeses yfinance by Yahoo. Let’s give it a try. (but IEX looks interesting as well)
Aw, that is a great opportunity to remember how plotly works.
import yfinance as yf
import plotly.express as px
import plotly
import json
msft = yf.Ticker("MSFT")
ms = msft.history(period="max")
ms.reset_index(inplace=True)
fig = px.line(ms, x="Date", y="Close", title='Microsoft stock prices')
fig.show()
btw., displaying plotly in Rmarkdown file is ridiculously complicated
OK, it looks like I got some data.
At first I was thinking that LSTM would be perfect, as it is widely used for time series data prediction. However it does not seem to be able to model/predict (I am used to saying ‘model’ as in economics/econometrics, even though in machine learning we rather ‘fit’ functions to the data) dependencies between different companies. As a result, I will try to implement a convolutional neural network as a super-flexible function which will fit nicely to my data.
But first I need:
more data, preferably on ~10 different companies, from the same market segment (for a start)
ideas on how to create a sensible dataset from them. Throwing them into a neural network without any consideration would clearly not work, which would result in no $$$ and in me being sad.
“More data” point is as simple changing “MSFT” into some other stock, but creating a valid dataset? For now I assume that:
stock fluctuations in the same market segment are positively correlated but there might have been some unusual events where the tendency was negative, e.g. Apple’s stocks went down, while the rest went up. These are the scenarios I am particularly interested in.
I should keep in mind that in order to outsmart other players on the market, I must learn their weaknesses, and the strategy above does not seem to take advantage of any of them… but it’s still something for a start.
More data!
To be continued…