Data Analysis by R

LiNGAM by R

This is an example of LiNGAM by R.

Analysis to examine the coefficients of the model

How to find the coefficients of a model formula.

Here, the data created by the following formula is used. For e, we are using random numbers with a uniform distribution.
LiNGAM

library(pcalg)
setwd("C:/Rtest")
Data <- read.csv("Data.csv", header=T)
Str <- lingam(Data)$Bpruned # LiNGAM
rownames(Str) <- colnames(Data)
colnames(Str) <- colnames(Data)
Str
LiNGAM

The value that is almost the same as the coefficient when the data was created was obtained.

Analysis of the magnitude of the effect of variables

When you want to find out the magnitude of the influence of each variable in Multi-Regression Analysis , it is wrong to simply compare the coefficients of each variable in the regression equation, and compare the partial regression coefficients. The partial regression coefficient is the coefficient obtained by standardization then performing multiple regression analysis.

If you use LiNGAM for causality analysis, you will make the same mistake if you simply try to use the output of LiNGAM. Again, we'll standardize and then run LiNGAM.

Then, this coefficient is used as a numerical value indicating the magnitude of the influence for the thickness of the network structure.

LiNGAM + network graph

library(pcalg)
library(igraph)
setwd("C:/Rtest")
Data <- read.csv("Data.csv", header=T)
for (i in 1:ncol(Data)) {
Data[,i] <- (Data[,i] - mean(Data[,i]))/sd(Data[,i])
}
Str <- lingam(Data)$Bpruned # LiNGAM
rownames(Str) <- colnames(Data)
colnames(Str) <- colnames(Data)
GM2 <- t(abs(Str))
GM3 <- GM2*5/max(GM2)
GM4 <- graph.adjacency(GM3,weighted=T, mode = "directed")
plot(GM4, edge.width=E(GM4)$weight)

LiNGAM

An error is added to the variable on the original side of the arrow to create a graph showing that it is the variable at the end of the arrow.

When not using pcalg

You need to have another library installed before you can use the library called pcalg. That seems to be the cause, and in some environments the above code will result in an error.

So here is the code to run LiNGAM without using pcalg: This is my own code. R-EDA1 also uses this code.

Since I did not understand the rearrangement of the matrix by the Hungarian method in the reference, the matrix that minimizes the sum of the diagonal components is made by randomly rearranging the rows in 10,000 ways, and the minimum value of the combination. I try to make. Since the factorial of 7 is 5040, if there are 8 or more variables, it may be a good match, but it may not be the smallest matrix.

library(igraph)
library(fastICA)
setwd("C:/Rtest")

Data <- read.csv("Data.csv", header=T)
n <- ncol(Data)
for (i in 1:n) {
Data[,i] <- (Data[,i] - mean(Data[,i]))/sd(Data[,i])

}
fICA <- fastICA(Data, n)

K <- fICA$K
W <- fICA$W
tKW <- t(K %*% W)
MintKW <- tKW
MinSum <- sum(diag(1/abs(tKW)))
for (i in 1:10000) {
tKW2 <- tKW[order(sample(tKW,n)),]
SumdiagRabstKW <- sum(diag(1/abs(tKW2)))
if(MinSum > SumdiagRabstKW){
MinSum <- SumdiagRabstKW
MintKW <- tKW2
}
}
MintKW2 <- MintKW
for (i in 1:n) {
MintKW2[i,] <- MintKW2[i,]/MintKW2[i,i]

}
Str <- MintKW2
diag(Str) <- 0

rownames(Str) <- colnames(Data)
colnames(Str) <- colnames(Data)
GM2 <- t(abs(Str))
GM3 <- GM2*5/max(GM2)
GM3[GM3<3] <- 0
GM4 <- graph.adjacency(GM3,weighted=T, mode = "directed")
plot(GM4, edge.width=E(GM4)$weight)

LiNGAM