Mutiple Linear Regression

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.1 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Mutiple Linear Regression

In scientific research and business practise, there are usually multiple covariates $x_1,x_2,...,x_p,p>1$ affecting the response $y$. Obviously, simple linear regression won’t be sufficient if we want to model the relationship between $y$ and $x_1,x_2,...,x_p$. A natural generalisation of the simple linear model is \[ mean[y]=a+b_1x_1+b_2x_2+...+b_px_p. \] Such model is called multiple linear regression or multivariable linear model. In essence, the additional predictors are expected to explain the variation in the response not explained by a simple linear regression fit.

Adding more covariates means that we have to estimate more coefficients as $b_1$, $b_2$,…,$b_p$. You may feel that multiple linear regression looks much more complicated than simple linear regression. The amazing fact is that we can still handle such a model with lm(). More importantly, most techniques we have learnt for simple linear regression remain valid for the multiple variable case.

Exercise 1: Adding more covariates

In this exercise we’ll look at modelling sales using all the advertising budgets on all the three different platforms. Recall that sales is correlated with youtube, facebook and newspaper.

Load the package datarium and the data marketing. We further turn the data set to a tidy tibble.

library(datarium)
data(`marketing`)
marketing <- marketing |> tibble()
marketing

## # A tibble: 200 × 4
##    youtube facebook newspaper sales
##      <dbl>    <dbl>     <dbl> <dbl>
##  1   276.     45.4       83.0 26.5 
##  2    53.4    47.2       54.1 12.5 
##  3    20.6    55.1       83.2 11.2 
##  4   182.     49.6       70.2 22.2 
##  5   217.     13.0       70.1 15.5 
##  6    10.4    58.7       90    8.64
##  7    69      39.4       28.2 14.2 
##  8   144.     23.5       13.9 15.8 
##  9    10.3     2.52       1.2  5.76
## 10   240.      3.12      25.4 12.7 
## # … with 190 more rows

A quick scan on the data set above reveals a critical fact: We have overlook the multivariable nature of the data set. Observations in sales are obtained with different combinations of the advertising budgets on three platforms. Therefore, all three variables youtube, facebook and newspaper may contribute to the corresponding sales. There will be an overlap between the audiences from different platforms. How can we quantify the contributions of these three variables? In addition, if we built three separate simple linear models with each of the covariates, we’d end up with three different predictions for sales. Which one should we choose?

To address the above issues, we have to model the relationship between sales and multiple covariates in one step. We’ll extend our simple linear model to take into account some of the other variables in the marketing data in this exercise.

We’ll start by adding in facebook to our first simple linear model (lm.youtube) and produce the model summary of this extended model as follows.

lm.youbook <- lm(sales ~ youtube + facebook, data=marketing)
summary(lm.youbook)

## 
## Call:
## lm(formula = sales ~ youtube + facebook, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5572  -1.0502   0.2906   1.4049   3.3994 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.50532    0.35339   9.919   <2e-16 ***
## youtube      0.04575    0.00139  32.909   <2e-16 ***
## facebook     0.18799    0.00804  23.382   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.018 on 197 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8962 
## F-statistic: 859.6 on 2 and 197 DF,  p-value: < 2.2e-16

Compare it to the model summary of lm.youtube as follows. you’ll notice a few things.

The Estimate for youtube will have changed.
There is a new row for facebook.

lm.youtube <- lm(sales ~ youtube, data=marketing)
summary(lm.youtube)

## 
## Call:
## lm(formula = sales ~ youtube, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.0632  -2.3454  -0.2295   2.4805   8.6548 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 8.439112   0.549412   15.36   <2e-16 ***
## youtube     0.047537   0.002691   17.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.91 on 198 degrees of freedom
## Multiple R-squared:  0.6119, Adjusted R-squared:  0.6099 
## F-statistic: 312.1 on 1 and 198 DF,  p-value: < 2.2e-16

Why do you think the Estimate (and thus Std. Error, t value, etc.) for youtube changed? What is the P-value for facebook here testing? Add some notes to your notebook about this.

Answer: The Estimate (and thus Std. Error, t value, etc.) for youtube changes as new information from facebook has been added in to the linear model. A certain portion of variablity in sales is explained by facebook. The P-value for facebook here tests if the coefficient of facebook is equal to zero or not in this bivariable linear model. This indeedly tests if the covariate facebook make sufficient contributions to the explanation of variations in sales given the fact that the covariate youtube has explained a significant portion of the variability in sales.

What is your conclusion about the relationship between sales, youtube and facebook?

Answer: Both youtube and facebook contribute to the explanation of variablities in sales through a bivariate linear function. The regression equation can be written as sales=3.50532+0.04575youtube+0.18799facebook.

Now add in newspaper to the linear model with both youtube and facebook, and check the model summary as follows

lm.youboper <- lm(sales ~ youtube + facebook + newspaper, data=marketing)
summary(lm.youboper)

## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper, data = marketing)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5932  -1.0690   0.2902   1.4272   3.3951 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.526667   0.374290   9.422   <2e-16 ***
## youtube      0.045765   0.001395  32.809   <2e-16 ***
## facebook     0.188530   0.008611  21.893   <2e-16 ***
## newspaper   -0.001037   0.005871  -0.177     0.86    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.023 on 196 degrees of freedom
## Multiple R-squared:  0.8972, Adjusted R-squared:  0.8956 
## F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16

What is your conclusion about the relationship between sales and newspaper?

Answer: After accouting for the contributions of youtube and facebook to modelling sales, newspaper does not offer any further improvement for modelling sales as the P-value of newspaper is insignificant at the level 0.05.

Fit a simple linear model relating sales to only newspaper. What do you find in the R summary on the significance of newspaper. Why?

Answer: newspaper becomes significant in the simple linear model. Without knowing anything on youtube and facebook, we can still use newpaper to explain some uncertainties in sales.

lm.newspaper <- lm(sales ~ newspaper, data=marketing)
summary(lm.newspaper)

## 
## Call:
## lm(formula = sales ~ newspaper, data = marketing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.473  -4.065  -1.007   4.207  15.330 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.82169    0.74570   19.88  < 2e-16 ***
## newspaper    0.05469    0.01658    3.30  0.00115 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.111 on 198 degrees of freedom
## Multiple R-squared:  0.05212,    Adjusted R-squared:  0.04733 
## F-statistic: 10.89 on 1 and 198 DF,  p-value: 0.001148

The Multiple R-squareds in three models, lm.youtube, lm.youbook, and lm.youboper are detailed as follows.
```
summary(lm.youtube)$r.squared
```
```
## [1] 0.6118751
```
```
summary(lm.youbook)$r.squared
```
```
## [1] 0.8971943
```
```
summary(lm.youboper)$r.squared
```
```
## [1] 0.8972106
```
Compare the Multiple R-squareds of the above three models. What do you find?

Answer: Multiple R-squared gets larger if we add in more covariates to the linear model which reflects a better goodness of fit.
Let’s take a look at visualising your model with all three variables using visreg. You’ll notice that it will produce 3 plots, one for each of the variables.
```
library(visreg)
visreg(lm.youboper,gg=TRUE)
```
```
## [[1]]
```
```
## 
## [[2]]
```
```
## 
## [[3]]
```
These are called partial residual plots. What they do is plot the relationship between the response and a single variable after holding all other variables constant (usually at their median). This allows you to see the effect of one variable after accounting for the others. Notice the relationship with newspaper isn’t very strong. You can choose a particular variable by using the xvar variable. e.g. visreg(lm.youboper, xvar="newspaper").

Compare these plots with the visualisation of the simple linear models lm.youtube, lm.facebook, lm.newspaper in Workshop C1. What do you find?

Answer: The slopes of each plot have been changed, especially for newspaper. This is not amazing since the presence of new covariates updates the relationship between sales and rest covarites.
The diagnostics of a multivariable linear model follow the same principles in those of a simple linear model. Let’s take a look at the model diagnostics for the model with all three covariates. Produce 4-in-1 plots using plot(). Add some notes to your R notebook as to whether the model assumptions are satisfied.

Answer: We use par(mfrow=c(2,2)) to show all four plots together. It only works for the basic R plot() so don’t use it for any ggplot() stuff. The residuals vs fitted plot is showing a bowl/bathtub trend while the scale-location plot looks spindle shaped. With the smoothed curves in them, we can say that both linearity and equal variance are highly questionable. The Q-Q plot looks also not very good with some obvious outliers. The residuals vs leverage plot further confirms the existence of outliers.
```
par(mfrow=c(2,2))
plot(lm.youboper)
```
We can take a log of sales and refit a multivariable linear model as follows
```
lm.youboper.log <- lm(log(sales) ~ youtube + facebook + newspaper, data=marketing)
```
Produce 4-in-1 diagnostic plots using plot(). Add some notes to your R notebook as to whether the model assumptions are satisfied.

Answer: All plots are suggesting that there are a few outliers distorting the diagnostics plots. If we ignore those outliers, the residuals vs fitted and scale-location plots will support the linearity and equal variance of the transformed model.You can try to do it by removing the outliers from the data, re-fit the model and, produce the diagnostics plots.
```
par(mfrow=c(2,2))
plot(lm.youboper.log)
```
We can still use predict() for predicting with a multivariable linear model. The only difference is that we need to specify each covariates in the data frame for newdata as
```
newbudget <- data.frame(youtube=0,facebook=0)
predict(lm.youbook,newdata=newbudget, interval='confidence')
```
```
##       fit      lwr      upr
## 1 3.50532 2.808412 4.202228
```
One can modify interval and level to get prediction or confidence intervals at different confidence levels.

Compare the above confidence interval with the confidence intervals of two simple linear models lm.youtube and lm.facebook at zero budgets. Discuss your findings.

Answer: Both the confidence and prediction intervals become narrower in the multivariable linear model. Adding in more covariates in our linear model explains more variabilities in sales which yields better uncertainty quantification in prediction.

Exercise 2: The more, the better?

In Ex1, we have seen that, by including more covariates in lm(), the goodness of fit of our linear models, i.e. $R^2$, can be improved. Even if the new covariate is insignificant in the original model, $R^2$ of lm.youboper is slightly higher than lm.youbook. It is not hard to conclude that lm.youbook is much better than lm.youtube. But how can we choose between lm.youbook and lm.youboper? These two models seem in a dead heat with each other.

An tricky fact is that $R^2$ will always be improved no matter what covariate is added into a linear model. This can be demonstrated by the following simulation study.

Let’s add in an additional variable to marketing as follows

set.seed(2020)
marketing.sim <- marketing |> mutate(noise=rnorm(200))

This additional variable (noise) is simulated from an exponential distribution. Of course, noise does not contribute any information to sales. But let us add in it to lm() and produce a model summary as follows

lm.youboper.sim <- lm(sales ~ youtube + facebook + newspaper + noise, data=marketing.sim)
summary(lm.youboper.sim)

## 
## Call:
## lm(formula = sales ~ youtube + facebook + newspaper + noise, 
##     data = marketing.sim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6014  -0.9853   0.2846   1.4330   3.2370 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.550460   0.375408   9.458   <2e-16 ***
## youtube      0.045699   0.001397  32.701   <2e-16 ***
## facebook     0.188528   0.008615  21.883   <2e-16 ***
## newspaper   -0.001381   0.005886  -0.235    0.815    
## noise       -0.114642   0.127544  -0.899    0.370    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.024 on 195 degrees of freedom
## Multiple R-squared:  0.8976, Adjusted R-squared:  0.8955 
## F-statistic: 427.5 on 4 and 195 DF,  p-value: < 2.2e-16

From the R summary, it is not amazing to find that noise is insignificant.

Compare the Multiple R-squared of lm.youboper.sim with those of lm.youtube, lm.youbook, and lm.youboper. What do you find?

Answer: The Multiple R-squared of lm.youboper.sim is larger than all the above models which suggests a (slightly) better goodness of fit.

The insignificance of a covariate does not mean that it is certainly not related to the response. In our data set marketing.sim, though noise is just a redundant variable containing no information, newspaper is still correlated with sales. The correct interpretation is that, after extracting the information on sales from youtube and facebook, newspaper becomes insignificant in explaining the variations in sales.

The guaranteed improvement in $R^2$ by adding more covariates can be dangerous as it may lead to some over-complicated models. In statistical modelling, an very important practical guideline is Occam’s razor or the law of parsimony which is the problem-solving principle that “entities should not be multiplied without necessity”. If two models provide similar fits to the real data set, we tend to keep the more parsimonious one, i.e. the model with less covariates.

For practitioners, a simple but effective idea is to remove these insignificant covariates from our linear model. In addition, we have Adjusted R-squared in the R summary to help us find the most concise model with a sufficient goodness of fit. Adjusted R-squared is modified from Multiple R-squared by taking the complexity of the linear models (the number of covariates) into the consideration. These numerical indicators can be extracted directly as follows.
```
summary(lm.youtube)$adj.r.squared
```
```
## [1] 0.6099148
```
```
summary(lm.youbook)$adj.r.squared
```
```
## [1] 0.8961505
```
```
summary(lm.youboper)$adj.r.squared
```
```
## [1] 0.8956373
```
```
summary(lm.youboper.sim)$adj.r.squared
```
```
## [1] 0.895535
```
Find the best model which balances the complexity and the goodness of fit by using Adjusted R-squared.

Answer: The best model is lm.youbook. Not amazing, noise does not contribute any information and the information in newspaper has been included in youtube and facebook.
Another tool to examine the necessity of including one or more covariates in our model is the ANalysis Of VAriance. We can figure out that lm.youtube is a model reduced from lm.youbook by setting the coefficient of facebook at zero. Therefore, just like comparing the linear trend model and quadratic trend model in Workshop C4, anova() can test if the reduction in R^2 is sufficient or not when adding one covariate as follows
```
anova(lm.youtube,lm.youbook)
```
```
## Analysis of Variance Table
## 
## Model 1: sales ~ youtube
## Model 2: sales ~ youtube + facebook
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    198 3027.64                                  
## 2    197  801.96  1    2225.7 546.74 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
The above ANOVA table suggests we shall keep the extended model lm.youbook.

Similarly, lm.youbook is a model reduced from lm.youboper.sim by setting the coefficient of facebook and noise at zero. The advantage of anova() is that it can check the pros and cons of two or more covariates as a group simulatanously as follows,
```
anova(lm.youbook,lm.youboper.sim)
```
```
## Analysis of Variance Table
## 
## Model 1: sales ~ youtube + facebook
## Model 2: sales ~ youtube + facebook + newspaper + noise
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    197 801.96                           
## 2    195 798.52  2    3.4361 0.4196 0.6579
```
Df in the second row of the ANOVA table is the number of additional coefficients being tested. It is not hard to decide that we shall keep the reduced model lm.youbook.

lm.youbook is involved in both ANOVA tables. But its roles in the two tables are different. It is the extended model in the first table but becomes the reduced one in the second. Just be careful when using anova() and make sure that you are putting a model at its correct position.

Re-run anova() by swapping the reduced model with the extended model. Discuss your findings.

Answer: Negative Df and Sum of Sq, same F and P-value. The conclusion will be the same, i.e. keeping the simple model.
```
anova(lm.youboper.sim,lm.youbook)
```
```
## Analysis of Variance Table
## 
## Model 1: sales ~ youtube + facebook + newspaper + noise
## Model 2: sales ~ youtube + facebook
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    195 798.52                           
## 2    197 801.96 -2   -3.4361 0.4196 0.6579
```
We can directly call anova() on a fitted linear model without considering the pair of a reduced model and an extended model as
```
anova(lm.youboper.sim)
```
```
## Analysis of Variance Table
## 
## Response: sales
##            Df Sum Sq Mean Sq   F value Pr(>F)    
## youtube     1 4773.1  4773.1 1165.5873 <2e-16 ***
## facebook    1 2225.7  2225.7  543.5169 <2e-16 ***
## newspaper   1    0.1     0.1    0.0312 0.8600    
## noise       1    3.3     3.3    0.8079 0.3698    
## Residuals 195  798.5     4.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
The above ANOVA table is testing the necessities of including each covariate one by one after adding in the previous covarite(s) to the linear model sequantially.

The first row corresponding to youtube tells us that adding youtube makes more sense than including no covariate in modelling sales (such a naive model can be fitted by lm(sales~1, data=marketing)). The second row corresponding to facebook tells us that adding facebook still makes more sense even if we have added youtube into modelling sales. The third (fourth) row suggests that, after considering the previous two (and three) covariates, the covariate newspaper (noise) contributes little information in modelling sales.

We can permute the order of covariates in lm() and generate the corresponding ANOVA table as follows
```
lm.youboper.sim.2 <- lm(sales ~ youtube +  noise + newspaper + facebook,data=marketing.sim)
anova(lm.youboper.sim.2)
```
```
## Analysis of Variance Table
## 
## Response: sales
##            Df Sum Sq Mean Sq   F value    Pr(>F)    
## youtube     1 4773.1  4773.1 1165.5873 < 2.2e-16 ***
## noise       1    8.7     8.7    2.1346    0.1456    
## newspaper   1  259.5   259.5   63.3759 1.377e-13 ***
## facebook    1 1960.9  1960.9  478.8456 < 2.2e-16 ***
## Residuals 195  798.5     4.1                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
Interpret each row of this ANOVA table and compare it with the previous ANOVA table. Discuss your findings.

Answer: newspaper becomes significant. It means, after accouting for youtube and noise, newspaper still contains useful information in predicting sales. But facebook is not yet considered. The summary table can be further used to check the significance of the coefficient.
The ANOVA table relies heavily on the $F$-test as we have mentioned in Workshop C4. From the R summaries of all above four models, we can access the a row called F-statistic. The F-statistics of all four models above are detailed as follows.
```
summary(lm.youtube)$fstatistic
```
```
##   value   numdf   dendf 
## 312.145   1.000 198.000
```
```
summary(lm.youbook)$fstatistic
```
```
##    value    numdf    dendf 
## 859.6177   2.0000 197.0000
```
```
summary(lm.youboper)$fstatistic
```
```
##    value    numdf    dendf 
## 570.2707   3.0000 196.0000
```
```
summary(lm.youboper.sim)$fstatistic
```
```
##    value    numdf    dendf 
## 427.4858   4.0000 195.0000
```
Check the value and numdf of F-statistic. Discuss your findings.

Answer: The best model has the highest value of F-statistic. numdf is exactly the number of the coefficients (except for the intercept).

F-statistic reported in the R summary is also from an ANOVA table which tests a null model with all coefficients being zero (the reduced model) against the fitted linear model (the extended model).

$F$-test is called an omnibus test as it tests if all coefficients in a linear model are equal to zero as a group. In a math way, the null hypothesis can be written as \[ H_0: b_1=b_2=...=b_p=0. \] Any $b_i$ being non-zero significantly will reject the null hypothesis and lead to a conclusion that there exist at least one covariate in our data set explaining the variations in the response $y$.

If you get F-statistic insignificant in your R summary, you need to double check your data set to make sure that the data set itself makes sense.

Exercise 3: Collinearity (Optional)

The solution is not provided for the optional exercise.

In a multivariate data set, we usually have a response $y$ and multiple covariates $x_1$, $x_2$,…, $x_p$. A multivariable linear model aims to model the relationship between $y$ and the $x$’s. We are expecting that the variations in $y$ can be well explained by including a suitable number of covariates as discussed in the previous exercise.

A side effect of multiple covariates is that there exist correlations between covariates themselves. Even if those correlations are weak, some specific combinations of correlations between several covariates can lead to some ill-posed results in our linear model.

In this exercise, we will study this critical issue arising from many real data sets, i.e. collinearity or multicollinearity.

Let’s simulate a dataset as follows.

set.seed(2021)
n <- 20
demo  <- tibble(x1=1:n,x2=sample(1:n),e=rnorm(n)) |> mutate(y=0.5*x1+0.5*x2+e) 
demo

## # A tibble: 20 × 4
##       x1    x2       e     y
##    <int> <int>   <dbl> <dbl>
##  1     1     7  0.182   4.18
##  2     2     6  1.51    5.51
##  3     3    14  1.60   10.1 
##  4     4    20 -1.84   10.2 
##  5     5    12  1.62   10.1 
##  6     6     4  0.131   5.13
##  7     7    19  1.48   14.5 
##  8     8    18  1.51   14.5 
##  9     9    11 -0.942   9.06
## 10    10    13 -0.186  11.3 
## 11    11     5 -1.10    6.90
## 12    12    17  1.21   15.7 
## 13    13     1 -1.62    5.38
## 14    14     9  0.105  11.6 
## 15    15    15 -1.46   13.5 
## 16    16     3 -0.354   9.15
## 17    17    16 -0.0937 16.4 
## 18    18     2  1.10   11.1 
## 19    19     8 -1.96   11.5 
## 20    20    10 -1.45   13.6

It is easy to fit a linear model based on the simulated data set demo as

lm.demo <-lm(y~x1+x2,data=demo)
summary(lm.demo)

## 
## Call:
## lm(formula = y ~ x1 + x2, data = demo)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60938 -0.95633  0.09897  0.89777  1.99849 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.75665    0.82349   0.919    0.371    
## x1           0.40593    0.04757   8.533 1.50e-07 ***
## x2           0.51938    0.04757  10.917 4.21e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.198 on 17 degrees of freedom
## Multiple R-squared:  0.9036, Adjusted R-squared:  0.8922 
## F-statistic: 79.65 on 2 and 17 DF,  p-value: 2.321e-09

Now let’s add in another predictor x3 which is the sum of the other two predictors to the tibble

demo.e.collin  <- demo |> mutate(x3=x1+x2) 
demo.e.collin

## # A tibble: 20 × 5
##       x1    x2       e     y    x3
##    <int> <int>   <dbl> <dbl> <int>
##  1     1     7  0.182   4.18     8
##  2     2     6  1.51    5.51     8
##  3     3    14  1.60   10.1     17
##  4     4    20 -1.84   10.2     24
##  5     5    12  1.62   10.1     17
##  6     6     4  0.131   5.13    10
##  7     7    19  1.48   14.5     26
##  8     8    18  1.51   14.5     26
##  9     9    11 -0.942   9.06    20
## 10    10    13 -0.186  11.3     23
## 11    11     5 -1.10    6.90    16
## 12    12    17  1.21   15.7     29
## 13    13     1 -1.62    5.38    14
## 14    14     9  0.105  11.6     23
## 15    15    15 -1.46   13.5     30
## 16    16     3 -0.354   9.15    19
## 17    17    16 -0.0937 16.4     33
## 18    18     2  1.10   11.1     20
## 19    19     8 -1.96   11.5     27
## 20    20    10 -1.45   13.6     30

Notice that the way we are generating this data, the response y only really depends on x1 and x2 What happens when we attempt to fit a regression model in R using all of the three predictors?

lm.demo.e.collin <-lm(y~x1+x2+x3,data=demo.e.collin)
summary(lm.demo.e.collin)

## 
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = demo.e.collin)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60938 -0.95633  0.09897  0.89777  1.99849 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.75665    0.82349   0.919    0.371    
## x1           0.40593    0.04757   8.533 1.50e-07 ***
## x2           0.51938    0.04757  10.917 4.21e-09 ***
## x3                NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.198 on 17 degrees of freedom
## Multiple R-squared:  0.9036, Adjusted R-squared:  0.8922 
## F-statistic: 79.65 on 2 and 17 DF,  p-value: 2.321e-09

We see that R simply decides to exclude the variable x3. Try to add another variable x4=x2-x1 and re-fit a linear model y~x1+x2+x4. Discuss your findings.

What if we do not remove x3?

This creates a big trouble for R as a bit arithmetics will show that y=0.5x1+0.5x2 is equivalent to y=0.5x3. More crazily, we have y=408.5x3-408x2-408x1. There are infinite combinations of coefficients for our underlying linear model.

Why is this happening? It is simply because that x3 can be predicted perfectly from x1 and x2 with a linear formula x3=x2+x1. The information contained in x3 is redundant given the information from x1 and x2.

When this happens, we say there is exact or perfect collinearity in the dataset. As a result of this issue, R essentially chose to fit the model y ~ x1 + x2 which agrees with the true underlying data generation mechanism.

However notice that two other models would generate different R summaries

lm.demo.e.collin.2 <-lm(y~x1+x3+x2,data=demo.e.collin)
summary(lm.demo.e.collin.2)

## 
## Call:
## lm(formula = y ~ x1 + x3 + x2, data = demo.e.collin)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60938 -0.95633  0.09897  0.89777  1.99849 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.75665    0.82349   0.919   0.3710    
## x1          -0.11344    0.05961  -1.903   0.0741 .  
## x3           0.51938    0.04757  10.917 4.21e-09 ***
## x2                NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.198 on 17 degrees of freedom
## Multiple R-squared:  0.9036, Adjusted R-squared:  0.8922 
## F-statistic: 79.65 on 2 and 17 DF,  p-value: 2.321e-09

lm.demo.e.collin.3 <-lm(y~x2+x3+x1,data=demo.e.collin)
summary(lm.demo.e.collin.3)

## 
## Call:
## lm(formula = y ~ x2 + x3 + x1, data = demo.e.collin)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.60938 -0.95633  0.09897  0.89777  1.99849 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.75665    0.82349   0.919   0.3710    
## x2           0.11344    0.05961   1.903   0.0741 .  
## x3           0.40593    0.04757   8.533  1.5e-07 ***
## x1                NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.198 on 17 degrees of freedom
## Multiple R-squared:  0.9036, Adjusted R-squared:  0.8922 
## F-statistic: 79.65 on 2 and 17 DF,  p-value: 2.321e-09

The order of covariates in lm() matters, just like anova(). R fits the model y ~ x1 + x3 and y ~ x2 + x3 respectively.

Given the fact x3=x1+x2, a bit arithmetic calculation will reveal that the above three model fits, lm.demo.e.collin, lm.demo.e.collin.2, and lm.demo.e.collin.3, are essentially equivalent.

This can be further confirmed by the fitted values and residuals of three models. Extract the fitted values and residuals of the above three models and compare them.

This is a result of all of the information contained in x3 being derived from x1 or x2. As long as one of x1 or x2 is included in the model, x3 can be used to recover the information from the variable not included.

While their fitted values (and residuals) are all the same, their estimated coefficients are quite different. The sign of x2 is switched in two of the models! So only lm.demo.collin properly explains the relationship between the variables, lm.demo.e.collin.2 and lm.demo.e.collin.3 still predict as well as lm.demo.collin, despite the coefficients having little to no meaning, a concept we will return to later.

Exact collinearity is an extreme example of collinearity, which occurs in multiple regression when predictor variables are highly correlated. From above two steps, it seems that exact collinearity is not a big deal since lm() can handle it automatically.

Yeah. Exact collinearity can be resolved easily but let us add a bit random perturbation to x3 as follows
```
set.seed(2022)
demo.collin  <- demo |> mutate(x3.r=x1+x2+rnorm(n,sd=0.01)) 
demo.collin
```
```
## # A tibble: 20 × 5
##       x1    x2       e     y  x3.r
##    <int> <int>   <dbl> <dbl> <dbl>
##  1     1     7  0.182   4.18  8.01
##  2     2     6  1.51    5.51  7.99
##  3     3    14  1.60   10.1  17.0 
##  4     4    20 -1.84   10.2  24.0 
##  5     5    12  1.62   10.1  17.0 
##  6     6     4  0.131   5.13  9.97
##  7     7    19  1.48   14.5  26.0 
##  8     8    18  1.51   14.5  26.0 
##  9     9    11 -0.942   9.06 20.0 
## 10    10    13 -0.186  11.3  23.0 
## 11    11     5 -1.10    6.90 16.0 
## 12    12    17  1.21   15.7  29.0 
## 13    13     1 -1.62    5.38 14.0 
## 14    14     9  0.105  11.6  23.0 
## 15    15    15 -1.46   13.5  30.0 
## 16    16     3 -0.354   9.15 19.0 
## 17    17    16 -0.0937 16.4  33.0 
## 18    18     2  1.10   11.1  20.0 
## 19    19     8 -1.96   11.5  27.0 
## 20    20    10 -1.45   13.6  30.0
```
Now x3.r is no longer a sum of x1 and x2. Without knowing the random pertubation caused by rnorm(n), we won’t be able to recover x2 from x1 and x3.r and vice versa. A tri-variable linear model fitted to this data set is given as follows
```
lm.demo.collin <- lm(y~x1+x2+x3.r, data=demo.collin)
summary(lm.demo.collin)
```
```
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3.r, data = demo.collin)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.81821 -0.68708  0.05492  0.81924  1.77835 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.5182     0.8915   0.581    0.569
## x1           22.8368    29.6306   0.771    0.452
## x2           22.9405    29.6178   0.775    0.450
## x3.r        -22.4176    29.6131  -0.757    0.460
## 
## Residual standard error: 1.213 on 16 degrees of freedom
## Multiple R-squared:  0.9069, Adjusted R-squared:  0.8895 
## F-statistic: 51.96 on 3 and 16 DF,  p-value: 1.804e-08
```
Unlike exact collinearity, here we can still fit a model with all of the predictors, but what effect does this have? All three coefficients become less significant!

One of the first things we should notice is that the $F$-test for the regression tells us that the regression is significant, however each individual predictor is not. Another interesting result is the opposite signs of the coefficients for x1 and x3.r. This should seem rather counter-intuitive. Increasing x1 increases y, but increasing x3.r decreases y?

This happens as a result of one or more predictors can be modelled by other predictors with a linear model. For example, the x1 variable explains a large amount of the variation in x3.r. When they are both in the model, their effects on the response are lessened individually, but together they still explain a large portion of the variation of y

Actually, Estimates for x1 and x2 still look ok but their Std.Errors are just too large which results in small t values and large P-values.

Including a variable like x.r in our linear model is very dangerous for our statistical inference. It distort the model summary by enlarging the value of Std.Error which further leads to a false P-value.
In some cases, we can identify the collinearity in our data set before fitting a linear model. This can be done via the pairs plot produced by ggpairs() from the R package GGally as
```
library(GGally)
```
```
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
```
```
demo.collin |> select(-e) |> ggpairs()
```
The diagonal of the pairs plot dispicts the densities of corresponding variables in the data set. The lower triangle collects the scatter plots of different pairs of variables and the upper triangle summarise the corresponding correlation coefficients. The pairs plot provides us an efficient way to visualise a multivariable data set.

From the above pairs plot, we can find that x3.r is highly linearly correlated with y. If two predictors are highly correlated which means one can be predicted by another through a line, they can also be identified from the pairs plot.

Produce the pairs plot for the data set marketing. Can you identify any issues in this data set?
However, it may not be easy to identify the collinearity in the data set with just a pairs plot. For example, in the last step we find that x.3r is highly correlated with y but it does not really reveal the true collinearity between the covariates. We need a better tool to spot those duplicates hidden in the data set.

Notice that Std.Errors in the previous summary of lm.demo.collin are abnormally large. We use the so-called Variance Inflation Factor (VIF) to detect the possible collinearites in a multivariable data set. The variance inflation factor quantifies the effect of collinearity on the variance of our regression estimates. The VIFs for each of the predictors in a linear model can be calculated by vif() from the R package faraway as follows.
```
library(faraway)
```
```
## 
## Attaching package: 'faraway'
```
```
## The following object is masked from 'package:GGally':
## 
##     happy
```
```
vif(lm.demo.collin)
```
```
##       x1       x2     x3.r 
## 396517.8 396174.1 622232.7
```
```
vif(lm.youboper)
```
```
##   youtube  facebook newspaper 
##  1.004611  1.144952  1.145187
```
In practice it is common to say that any VIF greater than 5 is cause for concern. So in this example we see there is a huge multicollinearity issue as all three predictors have a VIF greater than 5.

Check the VIFs of lm.youboper. Can you identify any issues in this data set?

LS0tCnRpdGxlOiAiV29ya3Nob3AgQzA1OiBNdWx0aXBsZSBMaW5lYXIgUmVncmVzc2lvbiwgQU5PVkEsIENvbGxpbmVhcml0eSIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICB0b2M6IHllcwogICAgY29kZV9kb3dubG9hZDogdHJ1ZQotLS0KICAKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMgTXV0aXBsZSBMaW5lYXIgUmVncmVzc2lvbgoKSW4gc2NpZW50aWZpYyByZXNlYXJjaCBhbmQgYnVzaW5lc3MgcHJhY3Rpc2UsIHRoZXJlIGFyZSB1c3VhbGx5IG11bHRpcGxlIGNvdmFyaWF0ZXMgJHhfMSx4XzIsLi4uLHhfcCxwPjEkIGFmZmVjdGluZyB0aGUgcmVzcG9uc2UgJHkkLiBPYnZpb3VzbHksIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbiB3b24ndCBiZSBzdWZmaWNpZW50IGlmIHdlIHdhbnQgdG8gbW9kZWwgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuICR5JCBhbmQgJHhfMSx4XzIsLi4uLHhfcCQuIEEgbmF0dXJhbCBnZW5lcmFsaXNhdGlvbiBvZiB0aGUgc2ltcGxlIGxpbmVhciBtb2RlbCBpcyAKJCQKbWVhblt5XT1hK2JfMXhfMStiXzJ4XzIrLi4uK2JfcHhfcC4KJCQKU3VjaCBtb2RlbCBpcyBjYWxsZWQgKiptdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbioqIG9yICoqbXVsdGl2YXJpYWJsZSBsaW5lYXIgbW9kZWwqKi4gSW4gZXNzZW5jZSwgdGhlIGFkZGl0aW9uYWwgcHJlZGljdG9ycyBhcmUgZXhwZWN0ZWQgdG8gZXhwbGFpbiB0aGUKdmFyaWF0aW9uIGluIHRoZSByZXNwb25zZSBub3QgZXhwbGFpbmVkIGJ5IGEgc2ltcGxlIGxpbmVhciByZWdyZXNzaW9uIGZpdC4gCgpBZGRpbmcgbW9yZSBjb3ZhcmlhdGVzIG1lYW5zIHRoYXQgd2UgaGF2ZSB0byBlc3RpbWF0ZSBtb3JlIGNvZWZmaWNpZW50cyBhcyAkYl8xJCwgJGJfMiQsLi4uLCRiX3AkLiBZb3UgbWF5IGZlZWwgdGhhdCBtdWx0aXBsZSBsaW5lYXIgcmVncmVzc2lvbiBsb29rcyBtdWNoIG1vcmUgY29tcGxpY2F0ZWQgdGhhbiBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24uIFRoZSBhbWF6aW5nIGZhY3QgaXMgdGhhdCB3ZSBjYW4gc3RpbGwgaGFuZGxlIHN1Y2ggYSBtb2RlbCB3aXRoIGBsbSgpYC4gTW9yZSBpbXBvcnRhbnRseSwgbW9zdCB0ZWNobmlxdWVzIHdlIGhhdmUgbGVhcm50IGZvciBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24gcmVtYWluIHZhbGlkIGZvciB0aGUgbXVsdGlwbGUgdmFyaWFibGUgY2FzZS4gCgojIyBFeGVyY2lzZSAxOiBBZGRpbmcgbW9yZSBjb3ZhcmlhdGVzCgpJbiB0aGlzIGV4ZXJjaXNlIHdlJ2xsIGxvb2sgYXQgbW9kZWxsaW5nIGBzYWxlc2AgdXNpbmcgYWxsIHRoZSBhZHZlcnRpc2luZyBidWRnZXRzIG9uIGFsbCB0aGUgdGhyZWUgZGlmZmVyZW50IHBsYXRmb3Jtcy4gUmVjYWxsIHRoYXQgYHNhbGVzYCBpcyBjb3JyZWxhdGVkIHdpdGggYHlvdXR1YmVgLCBgZmFjZWJvb2tgIGFuZCBgbmV3c3BhcGVyYC4gCgpMb2FkIHRoZSBwYWNrYWdlIGBkYXRhcml1bWAgYW5kIHRoZSBkYXRhIGBtYXJrZXRpbmdgLiBXZSBmdXJ0aGVyIHR1cm4gdGhlIGRhdGEgc2V0IHRvIGEgdGlkeSB0aWJibGUuCgpgYGB7cn0KbGlicmFyeShkYXRhcml1bSkKZGF0YShgbWFya2V0aW5nYCkKbWFya2V0aW5nIDwtIG1hcmtldGluZyB8PiB0aWJibGUoKQptYXJrZXRpbmcKYGBgCgpBIHF1aWNrIHNjYW4gb24gdGhlIGRhdGEgc2V0IGFib3ZlIHJldmVhbHMgYSBjcml0aWNhbCBmYWN0OiBXZSBoYXZlIG92ZXJsb29rIHRoZSBtdWx0aXZhcmlhYmxlIG5hdHVyZSBvZiB0aGUgZGF0YSBzZXQuIE9ic2VydmF0aW9ucyBpbiBgc2FsZXNgIGFyZSBvYnRhaW5lZCB3aXRoIGRpZmZlcmVudCBjb21iaW5hdGlvbnMgb2YgdGhlIGFkdmVydGlzaW5nIGJ1ZGdldHMgb24gdGhyZWUgcGxhdGZvcm1zLiBUaGVyZWZvcmUsIGFsbCB0aHJlZSB2YXJpYWJsZXMgYHlvdXR1YmVgLCBgZmFjZWJvb2tgIGFuZCBgbmV3c3BhcGVyYCBtYXkgY29udHJpYnV0ZSB0byB0aGUgY29ycmVzcG9uZGluZyBzYWxlcy4gVGhlcmUgd2lsbCBiZSBhbiBvdmVybGFwIGJldHdlZW4gdGhlIGF1ZGllbmNlcyBmcm9tIGRpZmZlcmVudCBwbGF0Zm9ybXMuIEhvdyBjYW4gd2UgcXVhbnRpZnkgdGhlIGNvbnRyaWJ1dGlvbnMgb2YgdGhlc2UgdGhyZWUgdmFyaWFibGVzPyBJbiBhZGRpdGlvbiwgaWYgd2UgYnVpbHQgdGhyZWUgc2VwYXJhdGUgc2ltcGxlIGxpbmVhciBtb2RlbHMgd2l0aCBlYWNoIG9mIHRoZSBjb3ZhcmlhdGVzLCB3ZSdkIGVuZCB1cCB3aXRoIHRocmVlIGRpZmZlcmVudCBwcmVkaWN0aW9ucyBmb3Igc2FsZXMuIFdoaWNoIG9uZSBzaG91bGQgd2UgY2hvb3NlPyAKClRvIGFkZHJlc3MgdGhlIGFib3ZlIGlzc3Vlcywgd2UgaGF2ZSB0byBtb2RlbCB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gYHNhbGVzYCBhbmQgbXVsdGlwbGUgY292YXJpYXRlcyBpbiBvbmUgc3RlcC4gV2UnbGwgZXh0ZW5kIG91ciBzaW1wbGUgbGluZWFyIG1vZGVsIHRvIHRha2UgaW50byBhY2NvdW50IHNvbWUgb2YgdGhlIG90aGVyIHZhcmlhYmxlcyBpbiB0aGUgYG1hcmtldGluZ2AgZGF0YSBpbiB0aGlzIGV4ZXJjaXNlLgoKMS4gV2UnbGwgc3RhcnQgYnkgYWRkaW5nIGluIGBmYWNlYm9va2AgdG8gb3VyIGZpcnN0IHNpbXBsZSBsaW5lYXIgbW9kZWwgKGBsbS55b3V0dWJlYCkgYW5kIHByb2R1Y2UgdGhlIG1vZGVsIHN1bW1hcnkgb2YgdGhpcyBleHRlbmRlZCBtb2RlbCBhcyBmb2xsb3dzLgoKICAgIGBgYHtyfQogICAgbG0ueW91Ym9vayA8LSBsbShzYWxlcyB+IHlvdXR1YmUgKyBmYWNlYm9vaywgZGF0YT1tYXJrZXRpbmcpCiAgICBzdW1tYXJ5KGxtLnlvdWJvb2spCiAgICBgYGAKICAgIAogICAgQ29tcGFyZSBpdCB0byB0aGUgbW9kZWwgc3VtbWFyeSBvZiBgbG0ueW91dHViZWAgYXMgZm9sbG93cy4geW91J2xsIG5vdGljZSBhIGZldyB0aGluZ3MuCiAgICAKICAgICAgKiBUaGUgYEVzdGltYXRlYCBmb3IgYHlvdXR1YmVgIHdpbGwgaGF2ZSBjaGFuZ2VkLgogICAgICAqIFRoZXJlIGlzIGEgbmV3IHJvdyBmb3IgYGZhY2Vib29rYC4KICAgICAgCiAgICBgYGB7cn0KICAgIGxtLnlvdXR1YmUgPC0gbG0oc2FsZXMgfiB5b3V0dWJlLCBkYXRhPW1hcmtldGluZykKICAgIHN1bW1hcnkobG0ueW91dHViZSkKICAgIGBgYAogICAgCiAgICAqV2h5IGRvIHlvdSB0aGluayB0aGUgYEVzdGltYXRlYCAoYW5kIHRodXMgYFN0ZC4gRXJyb3JgLCBgdCB2YWx1ZWAsIGV0Yy4pIGZvciBgeW91dHViZWAgY2hhbmdlZD8gIFdoYXQgaXMgdGhlIFAtdmFsdWUgZm9yIGBmYWNlYm9va2AgaGVyZSB0ZXN0aW5nPyBBZGQgc29tZSBub3RlcyB0byB5b3VyIG5vdGVib29rIGFib3V0IHRoaXMqLgogICAgCiAgICAqKkFuc3dlcioqOiBUaGUgYEVzdGltYXRlYCAoYW5kIHRodXMgYFN0ZC4gRXJyb3JgLCBgdCB2YWx1ZWAsIGV0Yy4pIGZvciBgeW91dHViZWAgY2hhbmdlcyBhcyBuZXcgaW5mb3JtYXRpb24gZnJvbSBgZmFjZWJvb2tgIGhhcyBiZWVuIGFkZGVkIGluIHRvIHRoZSBsaW5lYXIgbW9kZWwuIEEgY2VydGFpbiBwb3J0aW9uIG9mIHZhcmlhYmxpdHkgaW4gYHNhbGVzYCBpcyBleHBsYWluZWQgYnkgYGZhY2Vib29rYC4gVGhlIFAtdmFsdWUgZm9yIGBmYWNlYm9va2AgaGVyZSB0ZXN0cyBpZiB0aGUgY29lZmZpY2llbnQgb2YgYGZhY2Vib29rYCBpcyBlcXVhbCB0byB6ZXJvIG9yIG5vdCBpbiB0aGlzIGJpdmFyaWFibGUgbGluZWFyIG1vZGVsLiBUaGlzIGluZGVlZGx5IHRlc3RzIGlmIHRoZSBjb3ZhcmlhdGUgYGZhY2Vib29rYCBtYWtlIHN1ZmZpY2llbnQgY29udHJpYnV0aW9ucyB0byB0aGUgZXhwbGFuYXRpb24gb2YgdmFyaWF0aW9ucyBpbiBgc2FsZXNgIGdpdmVuIHRoZSBmYWN0IHRoYXQgdGhlIGNvdmFyaWF0ZSBgeW91dHViZWAgaGFzIGV4cGxhaW5lZCBhIHNpZ25pZmljYW50IHBvcnRpb24gb2YgdGhlIHZhcmlhYmlsaXR5IGluIGBzYWxlc2AuIAogICAgCiAgICAqV2hhdCBpcyB5b3VyIGNvbmNsdXNpb24gYWJvdXQgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIGBzYWxlc2AsIGB5b3V0dWJlYCBhbmQgYGZhY2Vib29rYD8qCiAgICAKICAgICoqQW5zd2VyKio6IEJvdGggYHlvdXR1YmVgIGFuZCBgZmFjZWJvb2tgIGNvbnRyaWJ1dGUgdG8gdGhlIGV4cGxhbmF0aW9uIG9mIHZhcmlhYmxpdGllcyBpbiBgc2FsZXNgIHRocm91Z2ggYSBiaXZhcmlhdGUgbGluZWFyIGZ1bmN0aW9uLiBUaGUgcmVncmVzc2lvbiBlcXVhdGlvbiBjYW4gYmUgd3JpdHRlbiBhcwogICAgKipgc2FsZXNgPTMuNTA1MzIrMC4wNDU3NWB5b3V0dWJlYCswLjE4Nzk5YGZhY2Vib29rYC4qKgoKMi4gTm93IGFkZCBpbiBgbmV3c3BhcGVyYCB0byB0aGUgbGluZWFyIG1vZGVsIHdpdGggYm90aCBgeW91dHViZWAgYW5kIGBmYWNlYm9va2AsIGFuZCBjaGVjayB0aGUgbW9kZWwgc3VtbWFyeSBhcyBmb2xsb3dzCgoKICAgIGBgYHtyfQogICAgbG0ueW91Ym9wZXIgPC0gbG0oc2FsZXMgfiB5b3V0dWJlICsgZmFjZWJvb2sgKyBuZXdzcGFwZXIsIGRhdGE9bWFya2V0aW5nKQogICAgc3VtbWFyeShsbS55b3Vib3BlcikKICAgIGBgYAoKICAgICpXaGF0IGlzIHlvdXIgY29uY2x1c2lvbiBhYm91dCB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gYHNhbGVzYCBhbmQgYG5ld3NwYXBlcmA/KgoKICAgICoqQW5zd2VyKio6IEFmdGVyIGFjY291dGluZyBmb3IgdGhlIGNvbnRyaWJ1dGlvbnMgb2YgYHlvdXR1YmVgIGFuZCBgZmFjZWJvb2tgIHRvIG1vZGVsbGluZyBgc2FsZXNgLCBgbmV3c3BhcGVyYCBkb2VzIG5vdCBvZmZlciBhbnkgZnVydGhlciBpbXByb3ZlbWVudCBmb3IgbW9kZWxsaW5nIGBzYWxlc2AgYXMgdGhlIFAtdmFsdWUgb2YgYG5ld3NwYXBlcmAgaXMgaW5zaWduaWZpY2FudCBhdCB0aGUgbGV2ZWwgMC4wNS4KICAgIAogICAgKkZpdCBhIHNpbXBsZSBsaW5lYXIgbW9kZWwgcmVsYXRpbmcgYHNhbGVzYCB0byBvbmx5IGBuZXdzcGFwZXJgLiBXaGF0IGRvIHlvdSBmaW5kIGluIHRoZSBSIHN1bW1hcnkgb24gdGhlIHNpZ25pZmljYW5jZSBvZiBgbmV3c3BhcGVyYC4gV2h5PyoKCiAgICAqKkFuc3dlcioqOiBgbmV3c3BhcGVyYCBiZWNvbWVzIHNpZ25pZmljYW50IGluIHRoZSBzaW1wbGUgbGluZWFyIG1vZGVsLiBXaXRob3V0IGtub3dpbmcgYW55dGhpbmcgb24gYHlvdXR1YmVgIGFuZCBgZmFjZWJvb2tgLCB3ZSBjYW4gc3RpbGwgdXNlIGBuZXdwYXBlcmAgdG8gZXhwbGFpbiBzb21lIHVuY2VydGFpbnRpZXMgaW4gYHNhbGVzYC4KCiAgICBgYGB7cn0KICAgIGxtLm5ld3NwYXBlciA8LSBsbShzYWxlcyB+IG5ld3NwYXBlciwgZGF0YT1tYXJrZXRpbmcpCiAgICBzdW1tYXJ5KGxtLm5ld3NwYXBlcikKICAgIGBgYAogICAgCjMuIFRoZSBgTXVsdGlwbGUgUi1zcXVhcmVkYHMgaW4gdGhyZWUgbW9kZWxzLCBgbG0ueW91dHViZWAsIGBsbS55b3Vib29rYCwgYW5kIGBsbS55b3Vib3BlcmAgYXJlIGRldGFpbGVkIGFzIGZvbGxvd3MuCgogICAgYGBge3J9CiAgICBzdW1tYXJ5KGxtLnlvdXR1YmUpJHIuc3F1YXJlZAogICAgc3VtbWFyeShsbS55b3Vib29rKSRyLnNxdWFyZWQKICAgIHN1bW1hcnkobG0ueW91Ym9wZXIpJHIuc3F1YXJlZAogICAgYGBgCiAgICAqQ29tcGFyZSB0aGUgYE11bHRpcGxlIFItc3F1YXJlZGBzIG9mIHRoZSBhYm92ZSB0aHJlZSBtb2RlbHMuIFdoYXQgZG8geW91IGZpbmQ/KgoKICAgICoqQW5zd2VyKio6IGBNdWx0aXBsZSBSLXNxdWFyZWRgIGdldHMgbGFyZ2VyIGlmIHdlIGFkZCBpbiBtb3JlIGNvdmFyaWF0ZXMgdG8gdGhlIGxpbmVhciBtb2RlbCB3aGljaCByZWZsZWN0cyBhIGJldHRlciBnb29kbmVzcyBvZiBmaXQuCiAKNC4gTGV0J3MgdGFrZSBhIGxvb2sgYXQgdmlzdWFsaXNpbmcgeW91ciBtb2RlbCB3aXRoIGFsbCB0aHJlZSB2YXJpYWJsZXMgdXNpbmcgYHZpc3JlZ2AuIFlvdSdsbCBub3RpY2UgdGhhdCBpdCB3aWxsIHByb2R1Y2UgMyBwbG90cywgb25lIGZvciBlYWNoIG9mIHRoZSB2YXJpYWJsZXMuCgogICAgYGBge3J9CiAgICBsaWJyYXJ5KHZpc3JlZykKICAgIHZpc3JlZyhsbS55b3Vib3BlcixnZz1UUlVFKQogICAgYGBgCgogICAgVGhlc2UgYXJlIGNhbGxlZCAqKnBhcnRpYWwqKiByZXNpZHVhbCBwbG90cy4gV2hhdCB0aGV5IGRvIGlzIHBsb3QgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSByZXNwb25zZSBhbmQgYSBzaW5nbGUgdmFyaWFibGUgYWZ0ZXIgaG9sZGluZyBhbGwgb3RoZXIgdmFyaWFibGVzIGNvbnN0YW50ICh1c3VhbGx5IGF0IHRoZWlyIG1lZGlhbikuIFRoaXMgYWxsb3dzIHlvdSB0byBzZWUgdGhlIGVmZmVjdCBvZiBvbmUgdmFyaWFibGUgYWZ0ZXIgYWNjb3VudGluZyBmb3IgdGhlIG90aGVycy4gTm90aWNlIHRoZSByZWxhdGlvbnNoaXAgd2l0aCBgbmV3c3BhcGVyYCBpc24ndCB2ZXJ5IHN0cm9uZy4gWW91IGNhbiBjaG9vc2UgYSBwYXJ0aWN1bGFyIHZhcmlhYmxlIGJ5IHVzaW5nIHRoZSBgeHZhcmAgdmFyaWFibGUuIGUuZy4gYHZpc3JlZyhsbS55b3Vib3BlciwgeHZhcj0ibmV3c3BhcGVyIilgLgogICAgCiAgICAqQ29tcGFyZSB0aGVzZSBwbG90cyB3aXRoIHRoZSB2aXN1YWxpc2F0aW9uIG9mIHRoZSBzaW1wbGUgbGluZWFyIG1vZGVscyBgbG0ueW91dHViZWAsIGBsbS5mYWNlYm9va2AsIGBsbS5uZXdzcGFwZXJgIGluIFdvcmtzaG9wIEMxLiBXaGF0IGRvIHlvdSBmaW5kPyoKICAgIAogICAgKipBbnN3ZXIqKjogVGhlIHNsb3BlcyBvZiBlYWNoIHBsb3QgaGF2ZSBiZWVuIGNoYW5nZWQsIGVzcGVjaWFsbHkgZm9yIGBuZXdzcGFwZXJgLiBUaGlzIGlzIG5vdCBhbWF6aW5nIHNpbmNlIHRoZSBwcmVzZW5jZSBvZiBuZXcgY292YXJpYXRlcyB1cGRhdGVzIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBgc2FsZXNgIGFuZCByZXN0IGNvdmFyaXRlcy4gCgo1LiBUaGUgZGlhZ25vc3RpY3Mgb2YgYSBtdWx0aXZhcmlhYmxlIGxpbmVhciBtb2RlbCBmb2xsb3cgdGhlIHNhbWUgcHJpbmNpcGxlcyBpbiB0aG9zZSBvZiBhIHNpbXBsZSBsaW5lYXIgbW9kZWwuICpMZXQncyB0YWtlIGEgbG9vayBhdCB0aGUgbW9kZWwgZGlhZ25vc3RpY3MgZm9yIHRoZSBtb2RlbCB3aXRoIGFsbCB0aHJlZSBjb3ZhcmlhdGVzLiBQcm9kdWNlIDQtaW4tMSBwbG90cyB1c2luZyBgcGxvdCgpYC4gQWRkIHNvbWUgbm90ZXMgdG8geW91ciBSIG5vdGVib29rIGFzIHRvIHdoZXRoZXIgdGhlIG1vZGVsIGFzc3VtcHRpb25zIGFyZSBzYXRpc2ZpZWQqLgoKICAgICoqQW5zd2VyKio6ICoqV2UgdXNlIGBwYXIobWZyb3c9YygyLDIpKWAgdG8gc2hvdyBhbGwgZm91ciBwbG90cyB0b2dldGhlci4gSXQgb25seSB3b3JrcyBmb3IgdGhlIGJhc2ljIFIgYHBsb3QoKWAgc28gZG9uJ3QgdXNlIGl0IGZvciBhbnkgYGdncGxvdCgpYCBzdHVmZi4qKiBUaGUgcmVzaWR1YWxzIHZzIGZpdHRlZCBwbG90IGlzIHNob3dpbmcgYSBib3dsL2JhdGh0dWIgdHJlbmQgd2hpbGUgdGhlIHNjYWxlLWxvY2F0aW9uIHBsb3QgbG9va3Mgc3BpbmRsZSBzaGFwZWQuIFdpdGggdGhlIHNtb290aGVkIGN1cnZlcyBpbiB0aGVtLCB3ZSBjYW4gc2F5IHRoYXQgYm90aCBsaW5lYXJpdHkgYW5kIGVxdWFsIHZhcmlhbmNlIGFyZSBoaWdobHkgcXVlc3Rpb25hYmxlLiBUaGUgUS1RIHBsb3QgbG9va3MgYWxzbyBub3QgdmVyeSBnb29kIHdpdGggc29tZSBvYnZpb3VzIG91dGxpZXJzLiBUaGUgcmVzaWR1YWxzIHZzIGxldmVyYWdlIHBsb3QgZnVydGhlciBjb25maXJtcyB0aGUgZXhpc3RlbmNlIG9mIG91dGxpZXJzLgogICAgCiAgICBgYGB7cn0KICAgIHBhcihtZnJvdz1jKDIsMikpCiAgICBwbG90KGxtLnlvdWJvcGVyKQogICAgYGBgCiAgICAKCjYuIFdlIGNhbiB0YWtlIGEgbG9nIG9mIGBzYWxlc2AgYW5kIHJlZml0IGEgbXVsdGl2YXJpYWJsZSBsaW5lYXIgbW9kZWwgYXMgZm9sbG93cyAKCiAgICBgYGB7cn0KICAgIGxtLnlvdWJvcGVyLmxvZyA8LSBsbShsb2coc2FsZXMpIH4geW91dHViZSArIGZhY2Vib29rICsgbmV3c3BhcGVyLCBkYXRhPW1hcmtldGluZykKICAgIGBgYAoKICAgICpQcm9kdWNlIDQtaW4tMSBkaWFnbm9zdGljIHBsb3RzIHVzaW5nIGBwbG90KClgLiBBZGQgc29tZSBub3RlcyB0byB5b3VyIFIgbm90ZWJvb2sgYXMgdG8gd2hldGhlciB0aGUgbW9kZWwgYXNzdW1wdGlvbnMgYXJlIHNhdGlzZmllZC4qCgogICAgKipBbnN3ZXIqKjogQWxsIHBsb3RzIGFyZSBzdWdnZXN0aW5nIHRoYXQgdGhlcmUgYXJlIGEgZmV3IG91dGxpZXJzIGRpc3RvcnRpbmcgdGhlIGRpYWdub3N0aWNzIHBsb3RzLiBJZiB3ZSBpZ25vcmUgdGhvc2Ugb3V0bGllcnMsIHRoZSByZXNpZHVhbHMgdnMgZml0dGVkIGFuZCBzY2FsZS1sb2NhdGlvbiBwbG90cyB3aWxsIHN1cHBvcnQgdGhlIGxpbmVhcml0eSBhbmQgZXF1YWwgdmFyaWFuY2Ugb2YgdGhlIHRyYW5zZm9ybWVkIG1vZGVsLllvdSBjYW4gdHJ5IHRvIGRvIGl0IGJ5IHJlbW92aW5nIHRoZSBvdXRsaWVycyBmcm9tIHRoZSBkYXRhLCByZS1maXQgdGhlIG1vZGVsIGFuZCwgcHJvZHVjZSB0aGUgZGlhZ25vc3RpY3MgcGxvdHMuIAogICAgCiAgICBgYGB7cn0KICAgIHBhcihtZnJvdz1jKDIsMikpCiAgICBwbG90KGxtLnlvdWJvcGVyLmxvZykKICAgIGBgYAo3LiBXZSBjYW4gc3RpbGwgdXNlIGBwcmVkaWN0KClgIGZvciBwcmVkaWN0aW5nIHdpdGggYSBtdWx0aXZhcmlhYmxlIGxpbmVhciBtb2RlbC4gVGhlIG9ubHkgZGlmZmVyZW5jZSBpcyB0aGF0IHdlIG5lZWQgdG8gc3BlY2lmeSBlYWNoIGNvdmFyaWF0ZXMgaW4gdGhlIGRhdGEgZnJhbWUgZm9yIGBuZXdkYXRhYCBhcyAKCiAgICBgYGB7cn0KICAgIG5ld2J1ZGdldCA8LSBkYXRhLmZyYW1lKHlvdXR1YmU9MCxmYWNlYm9vaz0wKQogICAgcHJlZGljdChsbS55b3Vib29rLG5ld2RhdGE9bmV3YnVkZ2V0LCBpbnRlcnZhbD0nY29uZmlkZW5jZScpCiAgICBgYGAKICAgIE9uZSBjYW4gbW9kaWZ5IGBpbnRlcnZhbGAgYW5kIGBsZXZlbGAgdG8gZ2V0IHByZWRpY3Rpb24gb3IgY29uZmlkZW5jZSBpbnRlcnZhbHMgYXQgZGlmZmVyZW50IGNvbmZpZGVuY2UgbGV2ZWxzLiAKICAgIAogICAgKkNvbXBhcmUgdGhlIGFib3ZlIGNvbmZpZGVuY2UgaW50ZXJ2YWwgd2l0aCB0aGUgY29uZmlkZW5jZSBpbnRlcnZhbHMgb2YgdHdvIHNpbXBsZSBsaW5lYXIgbW9kZWxzIGBsbS55b3V0dWJlYCBhbmQgYGxtLmZhY2Vib29rYCBhdCB6ZXJvIGJ1ZGdldHMuIERpc2N1c3MgeW91ciBmaW5kaW5ncy4qCiAgICAKICAgICoqQW5zd2VyKio6IEJvdGggdGhlIGNvbmZpZGVuY2UgYW5kIHByZWRpY3Rpb24gaW50ZXJ2YWxzIGJlY29tZSBuYXJyb3dlciBpbiB0aGUgbXVsdGl2YXJpYWJsZSBsaW5lYXIgbW9kZWwuIEFkZGluZyBpbiBtb3JlIGNvdmFyaWF0ZXMgaW4gb3VyIGxpbmVhciBtb2RlbCBleHBsYWlucyBtb3JlIHZhcmlhYmlsaXRpZXMgaW4gYHNhbGVzYCB3aGljaCB5aWVsZHMgYmV0dGVyIHVuY2VydGFpbnR5IHF1YW50aWZpY2F0aW9uIGluIHByZWRpY3Rpb24uCiAgICAKICAgIAojIyBFeGVyY2lzZSAyOiBUaGUgbW9yZSwgdGhlIGJldHRlcj8KCkluIEV4MSwgd2UgaGF2ZSBzZWVuIHRoYXQsIGJ5IGluY2x1ZGluZyBtb3JlIGNvdmFyaWF0ZXMgaW4gYGxtKClgLCB0aGUgZ29vZG5lc3Mgb2YgZml0IG9mIG91ciBsaW5lYXIgbW9kZWxzLCBpLmUuICRSXjIkLCBjYW4gYmUgaW1wcm92ZWQuIEV2ZW4gaWYgdGhlIG5ldyBjb3ZhcmlhdGUgaXMgaW5zaWduaWZpY2FudCBpbiB0aGUgb3JpZ2luYWwgbW9kZWwsICRSXjIkIG9mIGBsbS55b3Vib3BlcmAgaXMgc2xpZ2h0bHkgaGlnaGVyIHRoYW4gYGxtLnlvdWJvb2tgLiBJdCBpcyBub3QgaGFyZCB0byBjb25jbHVkZSB0aGF0IGBsbS55b3Vib29rYCBpcyBtdWNoIGJldHRlciB0aGFuIGBsbS55b3V0dWJlYC4gQnV0IGhvdyBjYW4gd2UgY2hvb3NlIGJldHdlZW4gYGxtLnlvdWJvb2tgIGFuZCBgbG0ueW91Ym9wZXJgPyBUaGVzZSB0d28gbW9kZWxzIHNlZW0gaW4gYSBkZWFkIGhlYXQgd2l0aCBlYWNoIG90aGVyLgoKQW4gdHJpY2t5IGZhY3QgaXMgdGhhdCAkUl4yJCB3aWxsIGFsd2F5cyBiZSBpbXByb3ZlZCBubyBtYXR0ZXIgd2hhdCBjb3ZhcmlhdGUgaXMgYWRkZWQgaW50byBhIGxpbmVhciBtb2RlbC4gVGhpcyBjYW4gYmUgZGVtb25zdHJhdGVkIGJ5IHRoZSBmb2xsb3dpbmcgc2ltdWxhdGlvbiBzdHVkeS4KCjEuIExldCdzIGFkZCBpbiBhbiBhZGRpdGlvbmFsIHZhcmlhYmxlIHRvIGBtYXJrZXRpbmdgIGFzIGZvbGxvd3MKCiAgICBgYGB7cn0KICAgIHNldC5zZWVkKDIwMjApCiAgICBtYXJrZXRpbmcuc2ltIDwtIG1hcmtldGluZyB8PiBtdXRhdGUobm9pc2U9cm5vcm0oMjAwKSkKICAgIGBgYAogICAgCiAgICBUaGlzIGFkZGl0aW9uYWwgdmFyaWFibGUgKGBub2lzZWApIGlzIHNpbXVsYXRlZCBmcm9tIGFuIGV4cG9uZW50aWFsIGRpc3RyaWJ1dGlvbi4gT2YgY291cnNlLCBgbm9pc2VgIGRvZXMgbm90IGNvbnRyaWJ1dGUgYW55IGluZm9ybWF0aW9uIHRvIGBzYWxlc2AuIEJ1dCBsZXQgdXMgYWRkIGluIGl0IHRvIGBsbSgpYCBhbmQgcHJvZHVjZSBhIG1vZGVsIHN1bW1hcnkgYXMgZm9sbG93cwogIAogICAgYGBge3J9CiAgICBsbS55b3Vib3Blci5zaW0gPC0gbG0oc2FsZXMgfiB5b3V0dWJlICsgZmFjZWJvb2sgKyBuZXdzcGFwZXIgKyBub2lzZSwgZGF0YT1tYXJrZXRpbmcuc2ltKQogICAgc3VtbWFyeShsbS55b3Vib3Blci5zaW0pCiAgICBgYGAKCiAgICBGcm9tIHRoZSBSIHN1bW1hcnksIGl0IGlzIG5vdCBhbWF6aW5nIHRvIGZpbmQgdGhhdCBgbm9pc2VgIGlzIGluc2lnbmlmaWNhbnQuIAoKICAgICpDb21wYXJlIHRoZSBgTXVsdGlwbGUgUi1zcXVhcmVkYCBvZiBgbG0ueW91Ym9wZXIuc2ltYCB3aXRoIHRob3NlIG9mIGBsbS55b3V0dWJlYCwgYGxtLnlvdWJvb2tgLCBhbmQgYGxtLnlvdWJvcGVyYC4gV2hhdCBkbyB5b3UgZmluZD8qCiAgICAKICAgICoqQW5zd2VyKio6IFRoZSBgTXVsdGlwbGUgUi1zcXVhcmVkYCBvZiBgbG0ueW91Ym9wZXIuc2ltYCBpcyBsYXJnZXIgdGhhbiBhbGwgdGhlIGFib3ZlIG1vZGVscyB3aGljaCBzdWdnZXN0cyBhIChzbGlnaHRseSkgYmV0dGVyIGdvb2RuZXNzIG9mIGZpdC4KCjIuIFRoZSBpbnNpZ25pZmljYW5jZSBvZiBhIGNvdmFyaWF0ZSBkb2VzIG5vdCBtZWFuIHRoYXQgaXQgaXMgY2VydGFpbmx5IG5vdCByZWxhdGVkIHRvIHRoZSByZXNwb25zZS4gSW4gb3VyIGRhdGEgc2V0IGBtYXJrZXRpbmcuc2ltYCwgdGhvdWdoIGBub2lzZWAgaXMganVzdCBhIHJlZHVuZGFudCB2YXJpYWJsZSBjb250YWluaW5nIG5vIGluZm9ybWF0aW9uLCBgbmV3c3BhcGVyYCBpcyBzdGlsbCBjb3JyZWxhdGVkIHdpdGggYHNhbGVzYC4gVGhlIGNvcnJlY3QgaW50ZXJwcmV0YXRpb24gaXMgdGhhdCwgYWZ0ZXIgZXh0cmFjdGluZyB0aGUgaW5mb3JtYXRpb24gb24gYHNhbGVzYCBmcm9tIGB5b3V0dWJlYCBhbmQgYGZhY2Vib29rYCwgYG5ld3NwYXBlcmAgYmVjb21lcyBpbnNpZ25pZmljYW50IGluIGV4cGxhaW5pbmcgdGhlIHZhcmlhdGlvbnMgaW4gYHNhbGVzYC4gCgogICAgVGhlIGd1YXJhbnRlZWQgaW1wcm92ZW1lbnQgaW4gJFJeMiQgYnkgYWRkaW5nIG1vcmUgY292YXJpYXRlcyBjYW4gYmUgZGFuZ2Vyb3VzIGFzIGl0IG1heSBsZWFkIHRvIHNvbWUgb3Zlci1jb21wbGljYXRlZCBtb2RlbHMuIEluIHN0YXRpc3RpY2FsIG1vZGVsbGluZywgYW4gdmVyeSBpbXBvcnRhbnQgcHJhY3RpY2FsIGd1aWRlbGluZSBpcyAqKk9jY2FtJ3MgcmF6b3IqKiBvciB0aGUgKipsYXcgb2YgcGFyc2ltb255Kiogd2hpY2ggaXMgdGhlIHByb2JsZW0tc29sdmluZyBwcmluY2lwbGUgdGhhdCAiZW50aXRpZXMgc2hvdWxkIG5vdCBiZSBtdWx0aXBsaWVkIHdpdGhvdXQgbmVjZXNzaXR5Ii4gSWYgdHdvIG1vZGVscyBwcm92aWRlIHNpbWlsYXIgZml0cyB0byB0aGUgcmVhbCBkYXRhIHNldCwgd2UgdGVuZCB0byBrZWVwIHRoZSBtb3JlIHBhcnNpbW9uaW91cyBvbmUsIGkuZS4gdGhlIG1vZGVsIHdpdGggbGVzcyBjb3ZhcmlhdGVzLgoKICAgIEZvciBwcmFjdGl0aW9uZXJzLCBhIHNpbXBsZSBidXQgZWZmZWN0aXZlIGlkZWEgaXMgdG8gcmVtb3ZlIHRoZXNlIGluc2lnbmlmaWNhbnQgY292YXJpYXRlcyBmcm9tIG91ciBsaW5lYXIgbW9kZWwuIEluIGFkZGl0aW9uLCB3ZSBoYXZlIGBBZGp1c3RlZCBSLXNxdWFyZWRgIGluIHRoZSBSIHN1bW1hcnkgdG8gaGVscCB1cyBmaW5kIHRoZSBtb3N0IGNvbmNpc2UgbW9kZWwgd2l0aCBhIHN1ZmZpY2llbnQgZ29vZG5lc3Mgb2YgZml0LiBgQWRqdXN0ZWQgUi1zcXVhcmVkYCBpcyBtb2RpZmllZCBmcm9tIGBNdWx0aXBsZSBSLXNxdWFyZWRgIGJ5IHRha2luZyB0aGUgY29tcGxleGl0eSBvZiB0aGUgbGluZWFyIG1vZGVscyAodGhlIG51bWJlciBvZiBjb3ZhcmlhdGVzKSBpbnRvIHRoZSBjb25zaWRlcmF0aW9uLiBUaGVzZSBudW1lcmljYWwgaW5kaWNhdG9ycyBjYW4gYmUgZXh0cmFjdGVkIGRpcmVjdGx5IGFzIGZvbGxvd3MuCgogICAgYGBge3J9CiAgICBzdW1tYXJ5KGxtLnlvdXR1YmUpJGFkai5yLnNxdWFyZWQKICAgIHN1bW1hcnkobG0ueW91Ym9vaykkYWRqLnIuc3F1YXJlZAogICAgc3VtbWFyeShsbS55b3Vib3BlcikkYWRqLnIuc3F1YXJlZAogICAgc3VtbWFyeShsbS55b3Vib3Blci5zaW0pJGFkai5yLnNxdWFyZWQKICAgIGBgYAoKICAgICpGaW5kIHRoZSBiZXN0IG1vZGVsIHdoaWNoIGJhbGFuY2VzIHRoZSBjb21wbGV4aXR5IGFuZCB0aGUgZ29vZG5lc3Mgb2YgZml0IGJ5IHVzaW5nIGBBZGp1c3RlZCBSLXNxdWFyZWRgLioKICAgIAogICAgKipBbnN3ZXIqKjogVGhlIGJlc3QgbW9kZWwgaXMgYGxtLnlvdWJvb2tgLiBOb3QgYW1hemluZywgYG5vaXNlYCBkb2VzIG5vdCBjb250cmlidXRlIGFueSBpbmZvcm1hdGlvbiBhbmQgdGhlIGluZm9ybWF0aW9uIGluIGBuZXdzcGFwZXJgIGhhcyBiZWVuIGluY2x1ZGVkIGluIGB5b3V0dWJlYCBhbmQgYGZhY2Vib29rYC4KCjMuIEFub3RoZXIgdG9vbCB0byBleGFtaW5lIHRoZSBuZWNlc3NpdHkgb2YgaW5jbHVkaW5nIG9uZSBvciBtb3JlIGNvdmFyaWF0ZXMgaW4gb3VyIG1vZGVsIGlzIHRoZSAqKkFOYWx5c2lzIE9mIFZBcmlhbmNlKiouIFdlIGNhbiBmaWd1cmUgb3V0IHRoYXQgYGxtLnlvdXR1YmVgIGlzIGEgbW9kZWwgcmVkdWNlZCBmcm9tIGBsbS55b3Vib29rYCBieSBzZXR0aW5nIHRoZSBjb2VmZmljaWVudCBvZiBgZmFjZWJvb2tgIGF0IHplcm8uIFRoZXJlZm9yZSwganVzdCBsaWtlIGNvbXBhcmluZyB0aGUgbGluZWFyIHRyZW5kIG1vZGVsIGFuZCBxdWFkcmF0aWMgdHJlbmQgbW9kZWwgaW4gV29ya3Nob3AgQzQsIGBhbm92YSgpYCBjYW4gdGVzdCBpZiB0aGUgcmVkdWN0aW9uIGluIGBSXjJgIGlzIHN1ZmZpY2llbnQgb3Igbm90IHdoZW4gYWRkaW5nIG9uZSBjb3ZhcmlhdGUgYXMgZm9sbG93cwoKICAgIGBgYHtyfQogICAgYW5vdmEobG0ueW91dHViZSxsbS55b3Vib29rKQogICAgYGBgCgogICAgVGhlIGFib3ZlIEFOT1ZBIHRhYmxlIHN1Z2dlc3RzIHdlIHNoYWxsIGtlZXAgdGhlIGV4dGVuZGVkIG1vZGVsIGBsbS55b3Vib29rYC4KCiAgICBTaW1pbGFybHksIGBsbS55b3Vib29rYCBpcyBhIG1vZGVsIHJlZHVjZWQgZnJvbSBgbG0ueW91Ym9wZXIuc2ltYCBieSBzZXR0aW5nIHRoZSBjb2VmZmljaWVudCBvZiBgZmFjZWJvb2tgIGFuZCBgbm9pc2VgIGF0IHplcm8uIFRoZSBhZHZhbnRhZ2Ugb2YgYGFub3ZhKClgIGlzIHRoYXQgaXQgY2FuIGNoZWNrIHRoZSBwcm9zIGFuZCBjb25zIG9mIHR3byBvciBtb3JlIGNvdmFyaWF0ZXMgYXMgYSBncm91cCBzaW11bGF0YW5vdXNseSBhcyBmb2xsb3dzLAogICAgCiAgICBgYGB7cn0KICAgIGFub3ZhKGxtLnlvdWJvb2ssbG0ueW91Ym9wZXIuc2ltKQogICAgYGBgCiAgICAKICAgIGBEZmAgaW4gdGhlIHNlY29uZCByb3cgb2YgdGhlIEFOT1ZBIHRhYmxlIGlzIHRoZSBudW1iZXIgb2YgYWRkaXRpb25hbCBjb2VmZmljaWVudHMgYmVpbmcgdGVzdGVkLiBJdCBpcyBub3QgaGFyZCB0byBkZWNpZGUgdGhhdCB3ZSBzaGFsbCBrZWVwIHRoZSByZWR1Y2VkIG1vZGVsIGBsbS55b3Vib29rYC4gCiAgICAgIAogICAgYGxtLnlvdWJvb2tgIGlzIGludm9sdmVkIGluIGJvdGggQU5PVkEgdGFibGVzLiBCdXQgaXRzIHJvbGVzIGluIHRoZSB0d28gdGFibGVzIGFyZSBkaWZmZXJlbnQuIEl0IGlzIHRoZSBleHRlbmRlZCBtb2RlbCBpbiB0aGUgZmlyc3QgdGFibGUgYnV0IGJlY29tZXMgdGhlIHJlZHVjZWQgb25lIGluIHRoZSBzZWNvbmQuIEp1c3QgYmUgY2FyZWZ1bCB3aGVuIHVzaW5nIGBhbm92YSgpYCBhbmQgbWFrZSBzdXJlIHRoYXQgeW91IGFyZSBwdXR0aW5nIGEgbW9kZWwgYXQgaXRzIGNvcnJlY3QgcG9zaXRpb24uIAogICAgCiAgICAqUmUtcnVuIGBhbm92YSgpYCBieSBzd2FwcGluZyB0aGUgcmVkdWNlZCBtb2RlbCB3aXRoIHRoZSBleHRlbmRlZCBtb2RlbC4gRGlzY3VzcyB5b3VyIGZpbmRpbmdzLioKICAgIAogICAgKipBbnN3ZXIqKjogTmVnYXRpdmUgYERmYCBhbmQgYFN1bSBvZiBTcWAsIHNhbWUgYEZgIGFuZCBQLXZhbHVlLiBUaGUgY29uY2x1c2lvbiB3aWxsIGJlIHRoZSBzYW1lLCBpLmUuIGtlZXBpbmcgdGhlIHNpbXBsZSBtb2RlbC4gCiAgICBgYGB7cn0KICAgIGFub3ZhKGxtLnlvdWJvcGVyLnNpbSxsbS55b3Vib29rKQogICAgYGBgCiAgICAKICAgIFdlIGNhbiBkaXJlY3RseSBjYWxsIGBhbm92YSgpYCBvbiBhIGZpdHRlZCBsaW5lYXIgbW9kZWwgd2l0aG91dCBjb25zaWRlcmluZyB0aGUgcGFpciBvZiBhIHJlZHVjZWQgbW9kZWwgYW5kIGFuIGV4dGVuZGVkIG1vZGVsIGFzCiAgICAKICAgIGBgYHtyfQogICAgYW5vdmEobG0ueW91Ym9wZXIuc2ltKQogICAgYGBgCgogICAgVGhlIGFib3ZlIEFOT1ZBIHRhYmxlIGlzIHRlc3RpbmcgdGhlIG5lY2Vzc2l0aWVzIG9mIGluY2x1ZGluZyBlYWNoIGNvdmFyaWF0ZSBvbmUgYnkgb25lIGFmdGVyIGFkZGluZyBpbiB0aGUgcHJldmlvdXMgY292YXJpdGUocykgdG8gdGhlIGxpbmVhciBtb2RlbCAqKnNlcXVhbnRpYWxseSoqLiAgCiAgICAgIAogICAgVGhlIGZpcnN0IHJvdyBjb3JyZXNwb25kaW5nIHRvIGB5b3V0dWJlYCB0ZWxscyB1cyB0aGF0IGFkZGluZyBgeW91dHViZWAgbWFrZXMgbW9yZSBzZW5zZSB0aGFuIGluY2x1ZGluZyBubyBjb3ZhcmlhdGUgaW4gbW9kZWxsaW5nIGBzYWxlc2AgKHN1Y2ggYSBuYWl2ZSBtb2RlbCBjYW4gYmUgZml0dGVkIGJ5IGBsbShzYWxlc34xLCBkYXRhPW1hcmtldGluZylgKS4gVGhlIHNlY29uZCByb3cgY29ycmVzcG9uZGluZyB0byBgZmFjZWJvb2tgIHRlbGxzIHVzIHRoYXQgYWRkaW5nIGBmYWNlYm9va2Agc3RpbGwgbWFrZXMgbW9yZSBzZW5zZSBldmVuIGlmIHdlIGhhdmUgYWRkZWQgYHlvdXR1YmVgIGludG8gbW9kZWxsaW5nIGBzYWxlc2AuIFRoZSB0aGlyZCAoZm91cnRoKSByb3cgc3VnZ2VzdHMgdGhhdCwgYWZ0ZXIgY29uc2lkZXJpbmcgdGhlIHByZXZpb3VzIHR3byAoYW5kIHRocmVlKSBjb3ZhcmlhdGVzLCB0aGUgY292YXJpYXRlIGBuZXdzcGFwZXJgIChgbm9pc2VgKSBjb250cmlidXRlcyBsaXR0bGUgaW5mb3JtYXRpb24gaW4gbW9kZWxsaW5nIGBzYWxlc2AuCiAgICAKICAgIFdlIGNhbiBwZXJtdXRlIHRoZSBvcmRlciBvZiBjb3ZhcmlhdGVzIGluIGBsbSgpYCBhbmQgZ2VuZXJhdGUgdGhlIGNvcnJlc3BvbmRpbmcgQU5PVkEgdGFibGUgYXMgZm9sbG93cwogICAgCiAgICBgYGB7cn0KICAgIGxtLnlvdWJvcGVyLnNpbS4yIDwtIGxtKHNhbGVzIH4geW91dHViZSArICBub2lzZSArIG5ld3NwYXBlciArIGZhY2Vib29rLGRhdGE9bWFya2V0aW5nLnNpbSkKICAgIGFub3ZhKGxtLnlvdWJvcGVyLnNpbS4yKQogICAgYGBgCiAgICAKICAgICpJbnRlcnByZXQgZWFjaCByb3cgb2YgdGhpcyBBTk9WQSB0YWJsZSBhbmQgY29tcGFyZSBpdCB3aXRoIHRoZSBwcmV2aW91cyBBTk9WQSB0YWJsZS4gRGlzY3VzcyB5b3VyIGZpbmRpbmdzLioKICAgIAogICAgKipBbnN3ZXIqKjogYG5ld3NwYXBlcmAgYmVjb21lcyBzaWduaWZpY2FudC4gSXQgbWVhbnMsIGFmdGVyIGFjY291dGluZyBmb3IgYHlvdXR1YmVgIGFuZCBgbm9pc2VgLCBgbmV3c3BhcGVyYCBzdGlsbCBjb250YWlucyB1c2VmdWwgaW5mb3JtYXRpb24gaW4gcHJlZGljdGluZyBgc2FsZXNgLiBCdXQgYGZhY2Vib29rYCBpcyBub3QgeWV0IGNvbnNpZGVyZWQuIFRoZSBzdW1tYXJ5IHRhYmxlIGNhbiBiZSBmdXJ0aGVyIHVzZWQgdG8gY2hlY2sgdGhlIHNpZ25pZmljYW5jZSBvZiB0aGUgY29lZmZpY2llbnQuIAogICAgCjQuIFRoZSBBTk9WQSB0YWJsZSByZWxpZXMgaGVhdmlseSBvbiB0aGUgJEYkLXRlc3QgYXMgd2UgaGF2ZSBtZW50aW9uZWQgaW4gV29ya3Nob3AgQzQuIEZyb20gdGhlIFIgc3VtbWFyaWVzIG9mIGFsbCBhYm92ZSBmb3VyIG1vZGVscywgd2UgY2FuIGFjY2VzcyB0aGUgYSByb3cgY2FsbGVkIGBGLXN0YXRpc3RpY2AuIFRoZSBgRi1zdGF0aXN0aWNgcyBvZiBhbGwgZm91ciBtb2RlbHMgYWJvdmUgYXJlIGRldGFpbGVkIGFzIGZvbGxvd3MuCgogICAgYGBge3J9CiAgICBzdW1tYXJ5KGxtLnlvdXR1YmUpJGZzdGF0aXN0aWMKICAgIHN1bW1hcnkobG0ueW91Ym9vaykkZnN0YXRpc3RpYwogICAgc3VtbWFyeShsbS55b3Vib3BlcikkZnN0YXRpc3RpYwogICAgc3VtbWFyeShsbS55b3Vib3Blci5zaW0pJGZzdGF0aXN0aWMKICAgIGBgYAoKICAgICpDaGVjayB0aGUgYHZhbHVlYCBhbmQgYG51bWRmYCBvZiBgRi1zdGF0aXN0aWNgLiBEaXNjdXNzIHlvdXIgZmluZGluZ3MuKgogICAgCiAgICAqKkFuc3dlcioqOiBUaGUgYmVzdCBtb2RlbCBoYXMgdGhlIGhpZ2hlc3QgYHZhbHVlYCBvZiBgRi1zdGF0aXN0aWNgLiBgbnVtZGZgIGlzIGV4YWN0bHkgdGhlIG51bWJlciBvZiB0aGUgY29lZmZpY2llbnRzIChleGNlcHQgZm9yIHRoZSBpbnRlcmNlcHQpLgogICAgCiAgICBgRi1zdGF0aXN0aWNgIHJlcG9ydGVkIGluIHRoZSBSIHN1bW1hcnkgaXMgYWxzbyBmcm9tIGFuIEFOT1ZBIHRhYmxlIHdoaWNoIHRlc3RzIGEgbnVsbCBtb2RlbCB3aXRoIGFsbCBjb2VmZmljaWVudHMgYmVpbmcgemVybyAodGhlIHJlZHVjZWQgbW9kZWwpIGFnYWluc3QgdGhlIGZpdHRlZCBsaW5lYXIgbW9kZWwgKHRoZSBleHRlbmRlZCBtb2RlbCkuIAogICAgCiAgICAkRiQtdGVzdCBpcyBjYWxsZWQgYW4gb21uaWJ1cyB0ZXN0IGFzIGl0IHRlc3RzIGlmIGFsbCBjb2VmZmljaWVudHMgaW4gYSBsaW5lYXIgbW9kZWwgYXJlIGVxdWFsIHRvIHplcm8gYXMgYSBncm91cC4gSW4gYSBtYXRoIHdheSwgdGhlIG51bGwgaHlwb3RoZXNpcyBjYW4gYmUgd3JpdHRlbiBhcwogICAgJCQKICAgIEhfMDogYl8xPWJfMj0uLi49Yl9wPTAuIAogICAgJCQKICAgIEFueSAkYl9pJCBiZWluZyBub24temVybyBzaWduaWZpY2FudGx5IHdpbGwgcmVqZWN0IHRoZSBudWxsIGh5cG90aGVzaXMgYW5kIGxlYWQgdG8gYSBjb25jbHVzaW9uIHRoYXQgdGhlcmUgZXhpc3QgYXQgbGVhc3Qgb25lIGNvdmFyaWF0ZSBpbiBvdXIgZGF0YSBzZXQgZXhwbGFpbmluZyB0aGUgdmFyaWF0aW9ucyBpbiB0aGUgcmVzcG9uc2UgJHkkLiAKICAgIAogICAgSWYgeW91IGdldCBgRi1zdGF0aXN0aWNgIGluc2lnbmlmaWNhbnQgaW4geW91ciBSIHN1bW1hcnksIHlvdSBuZWVkIHRvIGRvdWJsZSBjaGVjayB5b3VyIGRhdGEgc2V0IHRvIG1ha2Ugc3VyZSB0aGF0IHRoZSBkYXRhIHNldCBpdHNlbGYgbWFrZXMgc2Vuc2UuIAogICAgCiMjIEV4ZXJjaXNlIDM6IENvbGxpbmVhcml0eSAoT3B0aW9uYWwpCgoqKlRoZSBzb2x1dGlvbiBpcyBub3QgcHJvdmlkZWQgZm9yIHRoZSBvcHRpb25hbCBleGVyY2lzZS4qKgoKSW4gYSBtdWx0aXZhcmlhdGUgZGF0YSBzZXQsIHdlIHVzdWFsbHkgaGF2ZSBhIHJlc3BvbnNlICR5JCBhbmQgbXVsdGlwbGUgY292YXJpYXRlcyAkeF8xJCwgJHhfMiQsLi4uLCAkeF9wJC4gQSBtdWx0aXZhcmlhYmxlIGxpbmVhciBtb2RlbCBhaW1zIHRvIG1vZGVsIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiAkeSQgYW5kIHRoZSAkeCQncy4gV2UgYXJlIGV4cGVjdGluZyB0aGF0IHRoZSB2YXJpYXRpb25zIGluICR5JCBjYW4gYmUgd2VsbCBleHBsYWluZWQgYnkgaW5jbHVkaW5nIGEgc3VpdGFibGUgbnVtYmVyIG9mIGNvdmFyaWF0ZXMgYXMgZGlzY3Vzc2VkIGluIHRoZSBwcmV2aW91cyBleGVyY2lzZS4gCgpBIHNpZGUgZWZmZWN0IG9mIG11bHRpcGxlIGNvdmFyaWF0ZXMgaXMgdGhhdCB0aGVyZSBleGlzdCBjb3JyZWxhdGlvbnMgYmV0d2VlbiBjb3ZhcmlhdGVzIHRoZW1zZWx2ZXMuIEV2ZW4gaWYgdGhvc2UgY29ycmVsYXRpb25zIGFyZSB3ZWFrLCBzb21lIHNwZWNpZmljIGNvbWJpbmF0aW9ucyBvZiBjb3JyZWxhdGlvbnMgYmV0d2VlbiBzZXZlcmFsIGNvdmFyaWF0ZXMgY2FuIGxlYWQgdG8gc29tZSBpbGwtcG9zZWQgcmVzdWx0cyBpbiBvdXIgbGluZWFyIG1vZGVsLiAKCkluIHRoaXMgZXhlcmNpc2UsIHdlIHdpbGwgc3R1ZHkgdGhpcyBjcml0aWNhbCBpc3N1ZSBhcmlzaW5nIGZyb20gbWFueSByZWFsIGRhdGEgc2V0cywgaS5lLiAqKmNvbGxpbmVhcml0eSoqIG9yICoqbXVsdGljb2xsaW5lYXJpdHkqKi4gCgoxLiBMZXTigJlzIHNpbXVsYXRlIGEgZGF0YXNldCBhcyBmb2xsb3dzLgoKICAgIGBgYHtyfQogICAgc2V0LnNlZWQoMjAyMSkKICAgIG4gPC0gMjAKICAgIGRlbW8gIDwtIHRpYmJsZSh4MT0xOm4seDI9c2FtcGxlKDE6biksZT1ybm9ybShuKSkgfD4gbXV0YXRlKHk9MC41KngxKzAuNSp4MitlKSAKICAgIGRlbW8KICAgIGBgYAogICAgCiAgICBJdCBpcyBlYXN5IHRvIGZpdCBhIGxpbmVhciBtb2RlbCBiYXNlZCBvbiB0aGUgc2ltdWxhdGVkIGRhdGEgc2V0IGBkZW1vYCBhcyAKICAgIAogICAgYGBge3J9CiAgICBsbS5kZW1vIDwtbG0oeX54MSt4MixkYXRhPWRlbW8pCiAgICBzdW1tYXJ5KGxtLmRlbW8pCiAgICBgYGAKICAgIE5vdyBsZXQncyBhZGQgaW4gYW5vdGhlciBwcmVkaWN0b3IgYHgzYCB3aGljaCBpcyB0aGUgc3VtIG9mIHRoZSBvdGhlciB0d28gcHJlZGljdG9ycyB0byB0aGUgdGliYmxlIAoKCiAgICBgYGB7cn0KICAgIGRlbW8uZS5jb2xsaW4gIDwtIGRlbW8gfD4gbXV0YXRlKHgzPXgxK3gyKSAKICAgIGRlbW8uZS5jb2xsaW4KICAgIGBgYAoKCiAgICBOb3RpY2UgdGhhdCB0aGUgd2F5IHdlIGFyZSBnZW5lcmF0aW5nIHRoaXMgZGF0YSwgdGhlIHJlc3BvbnNlIGB5YCBvbmx5IHJlYWxseSBkZXBlbmRzIG9uIGB4MWAgYW5kIGB4MmAgV2hhdCBoYXBwZW5zIHdoZW4gd2UgYXR0ZW1wdCB0byBmaXQgYSByZWdyZXNzaW9uIG1vZGVsIGluIFIgdXNpbmcgYWxsIG9mIHRoZSB0aHJlZSBwcmVkaWN0b3JzPwogICAgCiAgICBgYGB7cn0KICAgIGxtLmRlbW8uZS5jb2xsaW4gPC1sbSh5fngxK3gyK3gzLGRhdGE9ZGVtby5lLmNvbGxpbikKICAgIHN1bW1hcnkobG0uZGVtby5lLmNvbGxpbikKICAgIGBgYAoKICAgIFdlIHNlZSB0aGF0IFIgc2ltcGx5IGRlY2lkZXMgdG8gZXhjbHVkZSB0aGUgdmFyaWFibGUgYHgzYC4gKlRyeSB0byBhZGQgYW5vdGhlciB2YXJpYWJsZSBgeDQ9eDIteDFgIGFuZCByZS1maXQgYSBsaW5lYXIgbW9kZWwgYHl+eDEreDIreDRgLiBEaXNjdXNzIHlvdXIgZmluZGluZ3MuKgogICAgCiAgICBXaGF0IGlmIHdlIGRvIG5vdCByZW1vdmUgYHgzYD8KICAgIAogICAgVGhpcyBjcmVhdGVzIGEgYmlnIHRyb3VibGUgZm9yIFIgYXMgYSBiaXQgYXJpdGhtZXRpY3Mgd2lsbCBzaG93IHRoYXQgYHk9MC41eDErMC41eDJgIGlzIGVxdWl2YWxlbnQgdG8gYHk9MC41eDNgLiBNb3JlIGNyYXppbHksIHdlIGhhdmUgYHk9NDA4LjV4My00MDh4Mi00MDh4MWAuIFRoZXJlIGFyZSBpbmZpbml0ZSBjb21iaW5hdGlvbnMgb2YgY29lZmZpY2llbnRzIGZvciBvdXIgdW5kZXJseWluZyBsaW5lYXIgbW9kZWwuCiAgICAKICAgIFdoeSBpcyB0aGlzIGhhcHBlbmluZz8gSXQgaXMgc2ltcGx5IGJlY2F1c2UgdGhhdCBgeDNgIGNhbiBiZSBwcmVkaWN0ZWQgcGVyZmVjdGx5IGZyb20gYHgxYCBhbmQgYHgyYCB3aXRoIGEgbGluZWFyIGZvcm11bGEgYHgzPXgyK3gxYC4gVGhlIGluZm9ybWF0aW9uIGNvbnRhaW5lZCBpbiBgeDNgIGlzIHJlZHVuZGFudCBnaXZlbiB0aGUgaW5mb3JtYXRpb24gZnJvbSBgeDFgIGFuZCBgeDJgLgogICAgCiAgICBXaGVuIHRoaXMgaGFwcGVucywgd2Ugc2F5IHRoZXJlIGlzICoqZXhhY3QqKiBvciAqKnBlcmZlY3QqKiBjb2xsaW5lYXJpdHkgaW4gdGhlIGRhdGFzZXQuIEFzIGEgcmVzdWx0IG9mIHRoaXMgaXNzdWUsIFIgZXNzZW50aWFsbHkgY2hvc2UgdG8gZml0IHRoZSBtb2RlbCBgeSB+IHgxICsgeDJgIHdoaWNoIGFncmVlcyB3aXRoIHRoZSB0cnVlIHVuZGVybHlpbmcgZGF0YSBnZW5lcmF0aW9uIG1lY2hhbmlzbS4gCgoyLiAgSG93ZXZlciBub3RpY2UgdGhhdCB0d28gb3RoZXIgbW9kZWxzIHdvdWxkIGdlbmVyYXRlIGRpZmZlcmVudCBSIHN1bW1hcmllcwoKICAgIGBgYHtyfQogICAgbG0uZGVtby5lLmNvbGxpbi4yIDwtbG0oeX54MSt4Myt4MixkYXRhPWRlbW8uZS5jb2xsaW4pCiAgICBzdW1tYXJ5KGxtLmRlbW8uZS5jb2xsaW4uMikKICAgIGxtLmRlbW8uZS5jb2xsaW4uMyA8LWxtKHl+eDIreDMreDEsZGF0YT1kZW1vLmUuY29sbGluKQogICAgc3VtbWFyeShsbS5kZW1vLmUuY29sbGluLjMpCiAgICBgYGAKCiAgICBUaGUgb3JkZXIgb2YgY292YXJpYXRlcyBpbiBgbG0oKWAgbWF0dGVycywganVzdCBsaWtlIGBhbm92YSgpYC4gUiBmaXRzIHRoZSBtb2RlbCBgeSB+IHgxICsgeDNgIGFuZCBgeSB+IHgyICsgeDNgIHJlc3BlY3RpdmVseS4KICAgIAogICAgR2l2ZW4gdGhlIGZhY3QgYHgzPXgxK3gyYCwgYSBiaXQgYXJpdGhtZXRpYyBjYWxjdWxhdGlvbiB3aWxsIHJldmVhbCB0aGF0IHRoZSBhYm92ZSB0aHJlZSBtb2RlbCBmaXRzLCBgbG0uZGVtby5lLmNvbGxpbmAsIGBsbS5kZW1vLmUuY29sbGluLjJgLCBhbmQgYGxtLmRlbW8uZS5jb2xsaW4uM2AsIGFyZSBlc3NlbnRpYWxseSBlcXVpdmFsZW50LiAKICAgIAogICAgKlRoaXMgY2FuIGJlIGZ1cnRoZXIgY29uZmlybWVkIGJ5IHRoZSBmaXR0ZWQgdmFsdWVzIGFuZCByZXNpZHVhbHMgb2YgdGhyZWUgbW9kZWxzLiBFeHRyYWN0IHRoZSBmaXR0ZWQgdmFsdWVzIGFuZCByZXNpZHVhbHMgb2YgdGhlIGFib3ZlIHRocmVlIG1vZGVscyBhbmQgY29tcGFyZSB0aGVtLioKCiAgICBUaGlzIGlzIGEgcmVzdWx0IG9mIGFsbCBvZiB0aGUgaW5mb3JtYXRpb24gY29udGFpbmVkIGluIGB4M2AgYmVpbmcgZGVyaXZlZCBmcm9tIGB4MWAgb3IgYHgyYC4gQXMgbG9uZyBhcyBvbmUgb2YgYHgxYCBvciBgeDJgIGlzIGluY2x1ZGVkIGluIHRoZSBtb2RlbCwgYHgzYCBjYW4gYmUgdXNlZCB0byByZWNvdmVyIHRoZSBpbmZvcm1hdGlvbiBmcm9tIHRoZSB2YXJpYWJsZSBub3QgaW5jbHVkZWQuCgogICAgV2hpbGUgdGhlaXIgZml0dGVkIHZhbHVlcyAoYW5kIHJlc2lkdWFscykgYXJlIGFsbCB0aGUgc2FtZSwgdGhlaXIgZXN0aW1hdGVkIGNvZWZmaWNpZW50cyBhcmUgcXVpdGUgZGlmZmVyZW50LiBUaGUgc2lnbiBvZiBgeDJgIGlzIHN3aXRjaGVkIGluIHR3byBvZiB0aGUgbW9kZWxzISBTbyBvbmx5IGBsbS5kZW1vLmNvbGxpbmAgcHJvcGVybHkgZXhwbGFpbnMgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSB2YXJpYWJsZXMsIGBsbS5kZW1vLmUuY29sbGluLjJgIGFuZCBgbG0uZGVtby5lLmNvbGxpbi4zYCBzdGlsbCBwcmVkaWN0IGFzIHdlbGwgYXMgYGxtLmRlbW8uY29sbGluYCwgZGVzcGl0ZSB0aGUgY29lZmZpY2llbnRzIGhhdmluZyBsaXR0bGUgdG8gbm8gbWVhbmluZywgYSBjb25jZXB0IHdlIHdpbGwgcmV0dXJuIHRvIGxhdGVyLgoKMy4gRXhhY3QgY29sbGluZWFyaXR5IGlzIGFuIGV4dHJlbWUgZXhhbXBsZSBvZiBjb2xsaW5lYXJpdHksIHdoaWNoIG9jY3VycyBpbiBtdWx0aXBsZSByZWdyZXNzaW9uIHdoZW4gcHJlZGljdG9yIHZhcmlhYmxlcyBhcmUgaGlnaGx5IGNvcnJlbGF0ZWQuIEZyb20gYWJvdmUgdHdvIHN0ZXBzLCBpdCBzZWVtcyB0aGF0IGV4YWN0IGNvbGxpbmVhcml0eSBpcyBub3QgYSBiaWcgZGVhbCBzaW5jZSBgbG0oKWAgY2FuIGhhbmRsZSBpdCBhdXRvbWF0aWNhbGx5LiAKCiAgICBZZWFoLiBFeGFjdCBjb2xsaW5lYXJpdHkgY2FuIGJlIHJlc29sdmVkIGVhc2lseSBidXQgbGV0IHVzIGFkZCBhIGJpdCByYW5kb20gcGVydHVyYmF0aW9uIHRvIGB4M2AgYXMgZm9sbG93cwoKICAgIGBgYHtyfQogICAgc2V0LnNlZWQoMjAyMikKICAgIGRlbW8uY29sbGluICA8LSBkZW1vIHw+IG11dGF0ZSh4My5yPXgxK3gyK3Jub3JtKG4sc2Q9MC4wMSkpIAogICAgZGVtby5jb2xsaW4KICAgIGBgYAogICAgCiAgICBOb3cgYHgzLnJgIGlzIG5vIGxvbmdlciBhIHN1bSBvZiBgeDFgIGFuZCBgeDJgLiBXaXRob3V0IGtub3dpbmcgdGhlIHJhbmRvbSBwZXJ0dWJhdGlvbiBjYXVzZWQgYnkgYHJub3JtKG4pYCwgd2Ugd29uJ3QgYmUgYWJsZSB0byByZWNvdmVyIGB4MmAgZnJvbSBgeDFgIGFuZCBgeDMucmAgYW5kIHZpY2UgdmVyc2EuIEEgdHJpLXZhcmlhYmxlIGxpbmVhciBtb2RlbCBmaXR0ZWQgdG8gdGhpcyBkYXRhIHNldCBpcyBnaXZlbiBhcyBmb2xsb3dzCiAgICAKICAgIGBgYHtyfQogICAgbG0uZGVtby5jb2xsaW4gPC0gbG0oeX54MSt4Mit4My5yLCBkYXRhPWRlbW8uY29sbGluKQogICAgc3VtbWFyeShsbS5kZW1vLmNvbGxpbikKICAgIGBgYAogICAgCiAgICBVbmxpa2UgZXhhY3QgY29sbGluZWFyaXR5LCBoZXJlIHdlIGNhbiBzdGlsbCBmaXQgYSBtb2RlbCB3aXRoIGFsbCBvZiB0aGUgcHJlZGljdG9ycywgYnV0IHdoYXQgZWZmZWN0IGRvZXMgdGhpcyBoYXZlPyBBbGwgdGhyZWUgY29lZmZpY2llbnRzIGJlY29tZSBsZXNzIHNpZ25pZmljYW50ISAKICAgIAogICAgT25lIG9mIHRoZSBmaXJzdCB0aGluZ3Mgd2Ugc2hvdWxkIG5vdGljZSBpcyB0aGF0IHRoZSAkRiQtdGVzdCBmb3IgdGhlIHJlZ3Jlc3Npb24gdGVsbHMgdXMgdGhhdCB0aGUgcmVncmVzc2lvbiBpcyBzaWduaWZpY2FudCwgaG93ZXZlciBlYWNoIGluZGl2aWR1YWwgcHJlZGljdG9yIGlzIG5vdC4gQW5vdGhlciBpbnRlcmVzdGluZyByZXN1bHQgaXMgdGhlIG9wcG9zaXRlIHNpZ25zIG9mIHRoZSBjb2VmZmljaWVudHMgZm9yIGB4MWAgYW5kIGB4My5yYC4gVGhpcyBzaG91bGQgc2VlbSByYXRoZXIgY291bnRlci1pbnR1aXRpdmUuIEluY3JlYXNpbmcgYHgxYCBpbmNyZWFzZXMgYHlgLCBidXQgaW5jcmVhc2luZyBgeDMucmAgZGVjcmVhc2VzIGB5YD8KCiAgICBUaGlzIGhhcHBlbnMgYXMgYSByZXN1bHQgb2Ygb25lIG9yIG1vcmUgcHJlZGljdG9ycyBjYW4gYmUgbW9kZWxsZWQgYnkgb3RoZXIgcHJlZGljdG9ycyB3aXRoIGEgbGluZWFyIG1vZGVsLiBGb3IgZXhhbXBsZSwgdGhlIGB4MWAgdmFyaWFibGUgZXhwbGFpbnMgYSBsYXJnZSBhbW91bnQgb2YgdGhlIHZhcmlhdGlvbiBpbiBgeDMucmAuIFdoZW4gdGhleSBhcmUgYm90aCBpbiB0aGUgbW9kZWwsIHRoZWlyIGVmZmVjdHMgb24gdGhlIHJlc3BvbnNlIGFyZSBsZXNzZW5lZCBpbmRpdmlkdWFsbHksIGJ1dCB0b2dldGhlciB0aGV5IHN0aWxsIGV4cGxhaW4gYSBsYXJnZSBwb3J0aW9uIG9mIHRoZSB2YXJpYXRpb24gb2YgYHlgCiAgICAKICAgIEFjdHVhbGx5LCBgRXN0aW1hdGVgcyBmb3IgYHgxYCBhbmQgYHgyYCBzdGlsbCBsb29rIG9rIGJ1dCB0aGVpciBgU3RkLkVycm9yYHMgYXJlIGp1c3QgdG9vIGxhcmdlIHdoaWNoIHJlc3VsdHMgaW4gc21hbGwgYHQgdmFsdWVgcyBhbmQgbGFyZ2UgUC12YWx1ZXMuIAogICAgCiAgICBJbmNsdWRpbmcgYSB2YXJpYWJsZSBsaWtlIGB4LnJgIGluIG91ciBsaW5lYXIgbW9kZWwgaXMgdmVyeSBkYW5nZXJvdXMgZm9yIG91ciBzdGF0aXN0aWNhbCBpbmZlcmVuY2UuIEl0IGRpc3RvcnQgdGhlIG1vZGVsIHN1bW1hcnkgYnkgZW5sYXJnaW5nIHRoZSB2YWx1ZSBvZiBgU3RkLkVycm9yYCB3aGljaCBmdXJ0aGVyIGxlYWRzIHRvIGEgZmFsc2UgUC12YWx1ZS4gCiAgICAKNC4gIEluIHNvbWUgY2FzZXMsIHdlIGNhbiBpZGVudGlmeSB0aGUgY29sbGluZWFyaXR5IGluIG91ciBkYXRhIHNldCBiZWZvcmUgZml0dGluZyBhIGxpbmVhciBtb2RlbC4gVGhpcyBjYW4gYmUgZG9uZSB2aWEgdGhlICoqcGFpcnMgcGxvdCoqIHByb2R1Y2VkIGJ5IGBnZ3BhaXJzKClgIGZyb20gdGhlIFIgcGFja2FnZSBgR0dhbGx5YCBhcyAKCiAgICBgYGB7cn0KICAgIGxpYnJhcnkoR0dhbGx5KQogICAgZGVtby5jb2xsaW4gfD4gc2VsZWN0KC1lKSB8PiBnZ3BhaXJzKCkKICAgIGBgYAogICAgCiAgICBUaGUgZGlhZ29uYWwgb2YgdGhlIHBhaXJzIHBsb3QgZGlzcGljdHMgdGhlIGRlbnNpdGllcyBvZiBjb3JyZXNwb25kaW5nIHZhcmlhYmxlcyBpbiB0aGUgZGF0YSBzZXQuIFRoZSBsb3dlciB0cmlhbmdsZSBjb2xsZWN0cyB0aGUgc2NhdHRlciBwbG90cyBvZiBkaWZmZXJlbnQgcGFpcnMgb2YgdmFyaWFibGVzIGFuZCB0aGUgdXBwZXIgdHJpYW5nbGUgc3VtbWFyaXNlIHRoZSBjb3JyZXNwb25kaW5nIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50cy4gVGhlIHBhaXJzIHBsb3QgcHJvdmlkZXMgdXMgYW4gZWZmaWNpZW50IHdheSB0byB2aXN1YWxpc2UgYSBtdWx0aXZhcmlhYmxlIGRhdGEgc2V0LgogICAgCiAgICBGcm9tIHRoZSBhYm92ZSBwYWlycyBwbG90LCB3ZSBjYW4gZmluZCB0aGF0IGB4My5yYCBpcyBoaWdobHkgbGluZWFybHkgY29ycmVsYXRlZCB3aXRoIGB5YC4gSWYgdHdvIHByZWRpY3RvcnMgYXJlIGhpZ2hseSBjb3JyZWxhdGVkIHdoaWNoIG1lYW5zIG9uZSBjYW4gYmUgcHJlZGljdGVkIGJ5IGFub3RoZXIgdGhyb3VnaCBhIGxpbmUsIHRoZXkgY2FuIGFsc28gYmUgaWRlbnRpZmllZCBmcm9tIHRoZSBwYWlycyBwbG90LgogICAgCiAgICAqUHJvZHVjZSB0aGUgcGFpcnMgcGxvdCBmb3IgdGhlIGRhdGEgc2V0IGBtYXJrZXRpbmdgLiBDYW4geW91IGlkZW50aWZ5IGFueSBpc3N1ZXMgaW4gdGhpcyBkYXRhIHNldD8qCiAgICAKNS4gSG93ZXZlciwgaXQgbWF5IG5vdCBiZSBlYXN5IHRvIGlkZW50aWZ5IHRoZSBjb2xsaW5lYXJpdHkgaW4gdGhlIGRhdGEgc2V0IHdpdGgganVzdCBhIHBhaXJzIHBsb3QuIEZvciBleGFtcGxlLCBpbiB0aGUgbGFzdCBzdGVwIHdlIGZpbmQgdGhhdCBgeC4zcmAgaXMgaGlnaGx5IGNvcnJlbGF0ZWQgd2l0aCBgeWAgYnV0IGl0IGRvZXMgbm90IHJlYWxseSByZXZlYWwgdGhlIHRydWUgY29sbGluZWFyaXR5IGJldHdlZW4gdGhlIGNvdmFyaWF0ZXMuIFdlIG5lZWQgYSBiZXR0ZXIgdG9vbCB0byBzcG90IHRob3NlIGR1cGxpY2F0ZXMgaGlkZGVuIGluIHRoZSBkYXRhIHNldC4gCgogICAgTm90aWNlIHRoYXQgYFN0ZC5FcnJvcmBzIGluIHRoZSBwcmV2aW91cyBzdW1tYXJ5IG9mIGBsbS5kZW1vLmNvbGxpbmAgYXJlIGFibm9ybWFsbHkgbGFyZ2UuIFdlIHVzZSB0aGUgc28tY2FsbGVkICoqVmFyaWFuY2UgSW5mbGF0aW9uIEZhY3RvciAoVklGKSoqIHRvIGRldGVjdCB0aGUgcG9zc2libGUgY29sbGluZWFyaXRlcyBpbiBhIG11bHRpdmFyaWFibGUgZGF0YSBzZXQuIFRoZSB2YXJpYW5jZSBpbmZsYXRpb24gZmFjdG9yIHF1YW50aWZpZXMgdGhlIGVmZmVjdCBvZiBjb2xsaW5lYXJpdHkgb24gdGhlIHZhcmlhbmNlIG9mIG91ciByZWdyZXNzaW9uIGVzdGltYXRlcy4gVGhlIFZJRnMgZm9yIGVhY2ggb2YgdGhlIHByZWRpY3RvcnMgaW4gYSAgbGluZWFyIG1vZGVsIGNhbiBiZSBjYWxjdWxhdGVkIGJ5IGB2aWYoKWAgZnJvbSB0aGUgUiBwYWNrYWdlIGBmYXJhd2F5YCBhcyBmb2xsb3dzLgogICAgCiAgICBgYGB7cn0KICAgIGxpYnJhcnkoZmFyYXdheSkKICAgIHZpZihsbS5kZW1vLmNvbGxpbikKICAgIGBgYAogICAgCiAgICBgYGB7cn0KICAgIHZpZihsbS55b3Vib3BlcikKICAgIGBgYAogICAgCiAgICBJbiBwcmFjdGljZSBpdCBpcyBjb21tb24gdG8gc2F5IHRoYXQgYW55IFZJRiBncmVhdGVyIHRoYW4gNSBpcyBjYXVzZSBmb3IgY29uY2Vybi4gU28gaW4gdGhpcyBleGFtcGxlIHdlIHNlZSB0aGVyZSBpcyBhIGh1Z2UgbXVsdGljb2xsaW5lYXJpdHkgaXNzdWUgYXMgYWxsIHRocmVlIHByZWRpY3RvcnMgaGF2ZSBhIFZJRiBncmVhdGVyIHRoYW4gNS4KICAgIAogICAgKkNoZWNrIHRoZSBWSUZzIG9mIGBsbS55b3Vib3BlcmAuIENhbiB5b3UgaWRlbnRpZnkgYW55IGlzc3VlcyBpbiB0aGlzIGRhdGEgc2V0Pyo=

Workshop C05: Multiple Linear Regression, ANOVA, Collinearity

Mutiple Linear Regression

Exercise 1: Adding more covariates

Exercise 2: The more, the better?

Exercise 3: Collinearity (Optional)