Read more

The post Need for Speed: High Frequency Economic News Trading appeared first on Justinas Brazys.

]]>US Employment report is considered the most important piece of economic news. Traders especially wait for the non-farm payrolls (NFP) number. For example June 3 report indicated that in May 38,000 new jobs have been created. Is that good or bad economic news for assets? In theory price reaction depends on two things:

(1) which part of the report is actually news and

(2) what this news means to an asset?

News can be estimated as the unexpected part of announcement: subtract expected part from actual announced figure. The expectation of economists was around 160,000 new jobs, therefore 38,000 is quite below the expectations. It is bad for economic growth, as less new jobs indicate less economic growth. What would this mean to USDJPY pair? In simple terms, this is bad news for US economy, thus USD is expected to depreciate vs. JPY in reaction to the news.

If you knew the news, how would much profit could you expect make? How fast do you need to be? Below is cumulative average return, assuming you are able to trade NFP at USDJPY prices available at 8.30 EST (US Employment report release time):

Almost 5 bps negative return at the beginning is bid-ask spread cost. It takes 1.5 seconds to reach breakeven point, and there is little systematic move of the price beyond 8 seconds. In total one could expect to make around 25 basis points net per trade. Since foreign exchange market allows for easy leverage, these 25 bps can easily turn into 2.5% per trade. There are 12 monthly announcements, thus 30% annual return. If we assume 10 seconds holding time per trade, that would make 2 minutes exposure time per annum for 30% return.

Of course this is too good to be truth. In reality it not possible to trade with zero latency. How much latency could we tolerate? The plot below shows returns assuming opening position with 0 to 30 second delay and closing position 30 second after announcement. Expected return drops sharply until 4 seconds after announcement, and at 6 there is no money to be made.

This seems like a good trading strategy, even being 2 seconds late can make 10 bps per trade! However, the data that I used is from 2003 till now. Over the last 5 years technology improved quite a bit, and more participants entered news trading. Did it make news trading extremely competitive business?The same latency robustness figure for 2015-2016 looks a bit different (see below) compare to the one above. There is no profit to be made after just 2 seconds.

And now the caveats. Could you trade on the information I have just provided? Yes, if you believe assumptions I make hold in reality:

- News is actually available at 8.30 EST for you. This is usually not the case, there is a slight delay as the announcement dispatched from Bureau of Labor Statistics travels to you. When processed by you it has to travel to the trade engine. This all takes time.
- In addition, we assumed that tick data time is in sync with BLS announcement data. The last plot could suggest that BLS server time is 1 second behind our price server data.
- There is plenty of liquidity to trade. In reality around important news liquidity dries up significantly, thus even if you were the first to send you trade order, you might end up moving the market to much.

From my experience, the higher the frequency of a trading strategy the more precision in data you need. And often higher frequency means assumptions on data quality and speed of execution will make or break a trading strategy once it goes live.

The post Need for Speed: High Frequency Economic News Trading appeared first on Justinas Brazys.

]]>Read more

The post Is US becoming Japan? Population dynamics, expected monetary policy and earnings yield appeared first on Justinas Brazys.

]]>What are implications of working population dynamics to the natural return? There could be several equally convincing explanations: more working-> higher growth, less working (e.g. more elderly) -> more consumption, more growth, more working-> more innovation->more growth.

This looks more like an empirical matter. At least 80% of empirical work is preparing data, and this time was no exception. You can download data here. The data set contains fractions of working population (working), short term rate (ST_interest) and local equity earnings yield (earnings over price) for 39 countries/regions. Lets load data.

# load data load("aging_data_web.Rdata")

Consider fixed effects panel data model for \(N\) observations and \(T\) time periods:

\(y_{it} = X_{it}\mathbf{\beta}+\alpha_{i}+u_{it}\)

for \(t=1,..,T\) and \(i=1,…,N\) where \(y_{it}\) is the dependent variable observed for country \(i\) at time \(t\), \(X_{it}\) is the time-variant \(1\times k\) regressor matrix, \(\alpha_{i}\) is the unobserved time-invariant individual effect (intercept) and \(u_{it}\) is the error term.

Lets estimate effect of working population on log earnings yield (EP) using all available country data. It is useful to correct for time effects, thus we include year variable (yr) in the model.

library(plm) # panel data fixed_EP <- plm(log(EP) ~ working+yr, data=data, index=c("Location"), model="within") coeftest(fixed_EP, vcovHC(fixed_EP, method = "arellano"))

The last line gives output for coefficient significance tests controlled for heteroscedasticity and autocorrelation.

t test of coefficients: Estimate Std. Error t value Pr(>|t|) working -0.0509624 0.0170459 -2.9897 0.0028662 ** yr -0.0076529 0.0021078 -3.6307 0.0002982 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

It seems over time earnings yield has been decreasing and increases in working population has negative effect on earning yield. How large is the effect? Since we are modelling log earnings yield, transformation is needed.

exp(fixed_EP$coefficients[2])-1 exp(fixed_EP$coefficients[1])-1

Earnings yield decreases annually by 0.7% from the current level. I.e. if current earnings yield is 10% then next year due to time will be 9.93% . Similarly 1% increase in working population would lead to decrease of earnings yield by 4.9%, this is approximately drop of 0.49% if current earnings yield is 10%.

Similarly we could estimate the model for short term rates:

# Modelling Short term rates fixed_ST <- plm(log(ST_interest) ~ working+yr, data=data, index=c("Location"), model="within") coeftest(fixed_ST, vcov.=vcovHC(fixed_ST,method = "arellano"))

t test of coefficients: Estimate Std. Error t value Pr(>|t|) working 0.1867083 0.0516120 3.6175 0.0003147 *** yr -0.0857416 0.0059476 -14.4163 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The effects of time is negative, while increase in working population is related to higher short term rate.

exp(fixed_ST$coefficients[2])-1 exp(fixed_ST$coefficients[1])-1

Short term rate decreases annually by 8% from the current level. I.e. if current interest rate is 1% then next year due to time will be 0.92% . This is approximately 0.21% increase if current rate is 1%.

For the last part, we show what could we expect of policy rate and earnings yield in, say, United States for the next 45 years. Because dataset includes projections for the working population till 2060 we can fit the model and make forecasts (R code at the end).

As pointed out above, from the working population perspective U.S. looks like Japan lagged 15 years. Due to expected shrinkage of working population in U.S. it seems short term rates are unlikely to go to higher levels in the foreseeable future. Due to population fundamentals Fed does not have that much space to maneuver using conventional policy. On the bright side, earnings yield is unlikely to go down much further.

# Forecasting EP --------------------------------------------------- fixefs <- merge(data, data.frame(Location = names(fixef(fixed_EP)),fixef = as.numeric(fixef(fixed_EP))), all.x = TRUE, by = c("Location")) fixefs<-fixefs[,ncol(fixefs)] data$fitted_forecast_EP <- exp(fixefs + fixed_EP$coefficients[1] * data$working + fixed_EP$coefficients[2]*data$yr) plot_data<-data[which(data$Location=="United States"),c("yr","Location","fitted_forecast_EP","EP")] last_available<-max(plot_data$yr[which(!is.na(plot_data$EP))]) plot_data$fitted_forecast_EP[which(plot_data$yr<=last_available)]<-NA plot(plot_data$yr,plot_data$EP,ty="l",xlab = "",ylab="EP",lwd=3) lines(plot_data$yr,plot_data$fitted_forecast_EP,col=2,lwd=2) title(unique(plot_data$Location)) # Forecasting Short term rates ------------------------------------- fixefs <- merge(data, data.frame(Location = names(fixef(fixed_ST)),fixef = as.numeric(fixef(fixed_ST))), all.x = TRUE, by = c("Location")) fixefs<-fixefs[,ncol(fixefs)] data$fitted_forecast_ST <- exp(fixefs + fixed_ST$coefficients[1] * data$working + fixed_ST$coefficients[2]*data$yr) plot_data<-data[which(data$Location=="United States"),c("yr","Location","fitted_forecast_ST","ST_interest")] last_available<-max(plot_data$yr[which(!is.na(plot_data$ST_interest))]) plot_data$fitted_forecast_ST[which(plot_data$yr<=last_available)]<-NA plot(plot_data$yr,plot_data$ST_interest,ty="l",xlab = "",ylab="ST Interest",lwd=3) lines(plot_data$yr,plot_data$fitted_forecast_ST,col=2,lwd=3) title(unique(plot_data$Location))

The post Is US becoming Japan? Population dynamics, expected monetary policy and earnings yield appeared first on Justinas Brazys.

]]>Read more

The post Big Data Analytics: Matlab vs. R appeared first on Justinas Brazys.

]]>When it comes to data analysis there are three competing languages: Matlab, R and Python. Choice largely depends on the task at hand. In this short Matlab vs. R tutorial let me compare R to Matlab when it comes to dealing with mixed data (aka messy, big data). What I call mixed data is when there are different data types referring to the same observation. For example, an observation of company “Apple Inc.” on date 2015-7-24. Information about the company could be its stock price, volume, and ticker. As you could see company and ticker is non-numeric type, date is of date type, and price, volume is of numeric type. How does Matlab and R accommodates such mixed types?

**Data types**

*R way*

R has data type “data.frame”. It allows easy storage and manipulation of mixed type data.

*Matlab way*

Do you know what “Mat” in Matlab stands for? Contrary to popular belief it is not mathematics, but matrix. It is MATrix LABoratory. The language is intended for matrix manipulation, i.e. numeric data. Until Matlab version R2013b it was not possible to store the data in one variable that is also easy to manipulate. There is structure type, however each field in the structure is updated independently and therefore element i one field might not refer to the same observation as element i in another field. My guess is that Matlab developers decided to catch up with capabilities of R by adding type “data.table”.

**Loading data**

*R way*

AAPL <-read.csv(file="http://jbrazys.com/wp-content/uploads/data/AAPL.csv", colClasses = c("Date","character","numeric","numeric","factor"))

Show names of columns:

names(AAPL)

*Matlab way*

It is not possible to read the file directly from url. Need to download first.

urlwrite('http://jbrazys.com/wp-content/uploads/data/AAPL.csv','AAPL.csv') AAPL = readtable('AAPL.csv','Format','%{yyyy/MM/dd}D%s%f%f%C');

Show names of columns

AAPL.Properties.VariableNames

Note that to read the dates as dates (%{yyyy/MM/dd}D) is only possible from version 2014b onwards. %C stands for categorical data, f% for floating point number.

**Ordering data**

*R way*

AAPL <- AAPL[order(AAPL$Date),]

*Matlab way*

AAPL =sortrows(AAPL,'Date','ascend');

**Showing data**

*R way*

#show first 3 rows AAPL[1:3,] #show first 3 observation of Price AAPL$Price[1:3]

*Matlab way*

Show first 3 rows

AAPL(1:3,:)

Show first 3 observation of Price

AAPL.Price(1:3)

**Manipulation of data**

A common task is computing some sort of transformation of cross-sectional data. For example compute time series of cross-sectional standard deviation of price and total volume. Since we have no cross-section yet, we need to load

*R way*

#load data stock_data <-read.csv(file=" http://jbrazys.com/wp-content/uploads/data/stock_data.csv", colClasses = c("Date","character","numeric","numeric","factor")) library(plyr) # package for data.frame manipulation summarized_data<-ddply(stock_data,.(Date),summarize, stdev_price=sd(Price),total_volume=sum(Volume))

What ddply does is it splits the data into groups and for each group computes specified transformation. In this case for each unique Date in stock_data it created variables *stdev_price* and *total_volume.* The function can be __any__ R base or user defined function.

*Matlab way*

urlwrite('http://jbrazys.com/wp-content/uploads/data/stock_data.csv','stock_data.csv') stock_data = readtable('stock_data.csv','Format','%{yyyy/MM/dd}D%s%f%f%C');

This is nice data table that is similar to data.frame in R. However the nice format of the table is not fully compatible with Matlab functions. For example to compute group data we can run function varfun(), however the grouping variable must be categorical, numerical, logical or string. Therefore if we would like to do it by date it will refuse to work. So we need a workaround: lets read Date column as string this time.

stock_data = readtable('stock_data.csv','Format','%s%s%f%f%C');

Another kink of varfun() is that currently it is not possible to do different transformation for each column in Matlab. Matlab function varfun applies the same function to all selected columns. Therefore the only way is to resort to multi-step procedure.

Compute standard deviation of price

summarized_data1 = varfun(@std, stock_data, ... 'InputVariables', 'Price',... 'GroupingVariables','Date');

Compute sum of Volume

summarized_data2 = varfun(@sum, stock_data, ... 'InputVariables', 'Volume',... 'GroupingVariables','Date');

Join data and get only data that we asked for

summarized_data = join(summarized_data1, summarized_data2,'Keys', 'Date') summarized_data(:,[1 3 5])

Summarizing: Matlab is catching up with functionality of R, however some features are still clumsy. For data analysis Matlab requires multiple steps whereas R can do the same in one step. Creating multiple temporary variables (or files) makes the code and analysis environment unnecessarily crowded.

Although in this post, I do not discuss issues when it comes to the size of big data, R has big data analytics packages (R: data.table) that can handle both mixed data types and large number of observations without running out of memory.

The post Big Data Analytics: Matlab vs. R appeared first on Justinas Brazys.

]]>Read more

The post Modern Portfolio Theory: Beating the Index appeared first on Justinas Brazys.

]]>According to Sun Tzu (The Art of War) knowing your enemy is important. So if we intend to beat the market, the market is the enemy we should investigate closer. Asset management industry is accustomed to beating specific indices. In other words, what is the market benchmark we intend to beat? To analyze this, let’s take Dow Jones as an example of the market. Dow Jones index includes 30 companies and is price-weighted. To put in layman’s words the index value is portfolio holding one stock of each company. Companies with higher share prices will account for larger share of portfolio assets.

Such price-weighted portfolio might be less than efficient according to modern portfolio theory. What is modern about modern portfolio theory? The quantitative approach is modern. To summarize, the theory assumes that risk is variance of the returns – more variance more risk. The efficient portfolio is one that achieves the required return with minimal risk. Portfolio risk can be computed as:

\(w^{\prime}Qw=\sigma^2_{portfolio}\)

with restrictions

\(\Sigma_i w_i=1\)

\(ER^{\prime}w\geq r\)

where \(Q\) is variance-covariance matrix of the returns, \(ER\) is vector of expected (historical average) return, \(w\) column vector of portfolio allocations and \(r\) is scalar minimal required return. By varying \(r\) we get efficient frontier (blue line: no shorting, no leverage allowed; black line: unrestricted):

Anything above the frontier is unattainable and anything below the frontier is inefficient. In the figure I mark the location of the DJIA index risk-return. As expected, the index portfolio is inefficient, thus can be improved. With 11.1% risk we get only 15% return. Following vertical red line we can select long-only portfolio on the efficient frontier (blue line) with the same amount of risk. With the same risk we could harvest 22.6% return – an outperformance of the index of 7.6% per annum. It seems like beating the market (index) is not so difficult. Or is it? Note that covariance matrix \(Q\) used in the optimization is in-sample, i.e. not known beforehand. The matrix can be forecasted, albeit the task is not easy, but this is a topic for another time. (Some discussion on variance-covariance matrix forecasting and accompanying R code could be find here)

Interested reader could find R code below (click to expand)

# Load packages library(quantmod) # downloading data library(quadprog) # solving quandratic programming problems library(ggplot2) #plotting # Load data ---------------------------------------------------------- # DJ components (http://www.djaverages.com/?go=industrial-components) DJIA_tickers<-c("MMM","AXP","AAPL","BA","CAT","CVX","CSCO","KO","DD","XOM", "GE","GS","HD","INTC","IBM","JPM","JNJ","MCD","MRK","MSFT", "NKE","PFE","PG","TRV","UTX","UNH","VZ","V","WMT","DIS") end<- format(Sys.Date(),"%Y-%m-%d") # end date: today nyears <- 3 start<-format(Sys.Date() - (nyears*365),"%Y-%m-%d") # create environment to load data into [neat trick on getting stock data into single data frame] dowloaded_data <- new.env() getSymbols(DJIA_tickers, src="yahoo", from=start, to=end, ascii = TRUE, auto.assign = T, warnings = FALSE,symbol.lookup = F,env=dowloaded_data) # returns Returns <- eapply(dowloaded_data, function(o) ROC(Ad(o), type="continuous")) ReturnsDF <- as.data.frame(do.call(merge, Returns)) # prices (for price-weighted index) Prices<-eapply(dowloaded_data, Ad) PricesDF<- as.data.frame(do.call(merge, Prices)) # Compute price-weighted index (our benchmark) Index<-data.frame(apply(PricesDF,1,mean)) Benchmark_returns<-ROC(Index, type="continuous") # ----- mean-variance optimization ------------------------------------ # define efficient frontier function efrontier<- function(returns, shortsallowed=T,leverageallowed=T,minret=0.01){ # Markowitz efficient frontier # Minimizes portfolio variance for the minimum required return # Additional (optional) restrictions: shortsallowed, leverage allowed covariance <- cov(returns,use="complete.obs") r <- matrix(colMeans(returns,na.rm=T), nrow=1) n=ncol(covariance) Amat <- matrix (1, ncol=n) # constraint Amat*x=bvec (portfolio allocations summing up to 1) bvec <- 1 meq <- 1 # first constraint is treated as equality # add minimum return constraint Amat <- rbind(Amat, r) bvec <- rbind(bvec, minret) # Add contraint conditions if short selling is not allowed if (!shortsallowed){ Amat <- rbind(Amat, diag(n)) bvec <- rbind(bvec, matrix (0, nrow=n)) } # Add contraint conditions if leverage (positions above 1) is not allowed if (!leverageallowed){ Amat <- rbind(Amat, -diag(n)) bvec <- rbind(bvec, matrix (-1, nrow=n)) } dvec <- matrix (0, nrow=n) sol=NULL try(sol <- solve.QP(covariance*2, dvec=t(dvec), Amat=t(Amat), bvec=t(bvec), meq=meq),silent = T) # note 1: solve.QP needs transposed matrix Amat and transposed vector dvec # note 2: if solution is unattainable under the restrictions then return NULL # note 3: solution is not always available under restrictions therefore try() # E.g. 10000% minimum return with no leverage is not likely to lead to a porfolio. if (!is.null(sol)){ x=data.frame(ER=sol$solution%*%t(r),V=sqrt(sol$solution%*%covariance%*%sol$solution), minret = minret,t(sol$solution)) } else { x=data.frame(ER=NA,V=NA,minret = minret,t(rep(NA,n))) } names(x)[4:(n+4-1)]=names(returns) return(x) } # ---- end of function ------------------------------------------------- r<-colMeans(ReturnsDF,na.rm = T) rmin<-min(r) rmax<-max(r) # get efficient frontier with no leverage and no short selling frontier_solution <-data.frame(ER=NULL,V=NULL,minret = NULL) for (ret in seq(rmin,rmax,by=((rmax-rmin)/100))){ frontier_solution<-rbind(frontier_solution, efrontier(ReturnsDF,minret=ret,shortsallowed=F,leverageallowed=F)) } # get efficient frontier with leverage and short selling allowed frontier_solution_unrestricted <-data.frame(ER=NULL,V=NULL,minret = NULL) for (ret in seq(rmin,rmax,by=((rmax-rmin)/100))){ frontier_solution_unrestricted<-rbind(frontier_solution_unrestricted, efrontier(ReturnsDF,minret=ret,shortsallowed=T,leverageallowed=T)) } # compute statistics for the benchmark portfolio index_portfolio = data.frame(ER=matrix(colMeans(Benchmark_returns,na.rm=T)),V = sqrt(matrix(cov(Benchmark_returns,use="complete.obs")))) #--- Visualization of the output # efficient frontier eff_frontier<-ggplot(frontier_solution, aes(x=V*sqrt(250), y=ER*250)) + geom_line(alpha=1, color='blue')+ labs(x = "St. dev (annualized)",y="Expected return (annualized)", title="DJIA Efficient Frontier") # index portfolio [point] eff_frontier<-eff_frontier+geom_point(data=index_portfolio, aes(x=V*sqrt(250), y=ER*250,size = 3),show_guide=F)+ geom_vline(xintercept = index_portfolio$V*sqrt(250), colour = "red", alpha = 0.4)+ annotate(geom="text", x=index_portfolio$V*sqrt(250), y=index_portfolio$ER*250, label=paste("Risk: ", round(index_portfolio$V*100*sqrt(250), digits=1),"%\nReturn: ", round(index_portfolio$ER*100*250, digits=1),"%\nSharpe: ", round(sqrt(250)*index_portfolio$ER/index_portfolio$V, digits=2), sep=""), hjust=-0.1, vjust=-0.1) # add unresticted eff_frontier<-eff_frontier+geom_line(data=frontier_solution_unrestricted,aes(x=V*sqrt(250), y=ER*250)) ## find a better portfolio with the same amount of risk better_portfolio_id <- which(min(abs(frontier_solution$V-index_portfolio$V),na.rm = T)==abs(frontier_solution$V-index_portfolio$V)) better_portfolio = data.frame(ER=frontier_solution$ER[better_portfolio_id], V=frontier_solution$V[better_portfolio_id]) eff_frontier<-eff_frontier+ geom_point(data=better_portfolio, aes(x=V*sqrt(250), y=ER*250,size = 3),show_guide=F)+ annotate(geom="text", x=better_portfolio$V*sqrt(250), y=better_portfolio$ER*250, label=paste("Risk: ", round(better_portfolio$V*100*sqrt(250), digits=1),"%\nReturn: ", round(better_portfolio$ER*100*250, digits=1),"%\nSharpe: ", round(sqrt(250)*better_portfolio$ER/better_portfolio$V, digits=2), sep=""), hjust=-0.1, vjust=-0.1) # render the plot eff_frontier

The post Modern Portfolio Theory: Beating the Index appeared first on Justinas Brazys.

]]>