2021-02-23

GitHub Pagesで個人Websiteを作成した

GitHub Pagesで個人Websiteを作成した。
GitHub Pagesとはなんぞや。アドレスがgithub.ioで終わっている個人サイトで、自分のGitHub リポジトリを使いwebsiteを作ることができる。ドメインを買ったり、自前でサーバを立てなくていい。
少し苦労したが、自分みたいなITど素人でも作成できる。普段からtexを書いていればhtmlも雰囲気で理解できるので、Google Sitesと迷っている人がいれば、ぜひGItHub Pagesを（回し者になりたいけど回し者ではありません）。

econgrad.hatenablog.com

f:id:econgrad:20200102220341j:plain

2021-01-16

22歳を過ぎたあたりから読書について注意していること

つぶやき大学院より中継読書

先日、こんな記事を書いた。

econgrad.hatenablog.com

半年前から、コロナがなければ2020年をどう過ごし、今何をしているか、みたいなことを考えていた訳で、この記事を書いた。

どこまで自分について書くのか、プライベートはことはあまり書いてこなかったもんで、長文編の構成に悩んでいる。

この題については置いといて、22歳23歳ごろ（つまり大学院に入学したころ）から感じ、年々強く感じるようになっているのが、「無駄な読書」についてだ。
学術的知識および能力をひたすら磨きたいので、いわゆる趣味の読書はやめてそっちに全振りしたいが、読みたい本も多い。そして、そういう読書から学ぶことが人生や研究に生きることもある。

そのため、本は読むことにしているが、無駄は本は出来るだけ読まないように、本選びには非常に慎重になった。また、完全に趣味としかならない（と思われる）小説を読むことについても慎重になってしまった。

f:id:econgrad:20181218191920j:plain

econgrad.hatenablog.com

2021-01-11

Covid-19は自分にどのような影響を与えたか：大学院生の振り返り（短文）

つぶやき大学院より中継

最後に更新をしてから、随分と時間が経っておりました。しかも、最後の更新は数年前に公開しようと思っていたものを、コピペしてそのまま更新したものです。

我々は、2020年を生き抜き、本記事を読んでいる人は、2021年を生きています。程度の差はあっても、誰もがCovid-19の影響を受けたと思います。Covidがなければどのような2020年を過ごし、現在何を考え何をしていただろう（counterfactual、知ることが出来ない事実）と考えることもあります。このようなことを考えるのは、Covidによる環境の変化が、自分の生活や意思決定、およびキャリアなどに大きな影響を与えたからであると思います。では、実際にどのような変化を強いられたのか、自分が選んだのか、それは自分にとって良いことなのかそうでないのか。今後について考える上で、自分という人間がBefore Covidとはどう変わったのか、考える必要があるかと思います。時間を設け、これらを洗い出し、文章にすることで、出来るだけ全てを整理したいと思います。It's not an idea until you write it down という言葉がありますが（Ivan Sutherland教授の言葉）、idea（アイデア・着想）だけでなくthought（思考）についてもある程度当てはまると思います。文字にすることで、整理されることがある。

次回の投稿では、『Covid-19は自分にどのような影響を与えたか：大学院生の振り返り（長文）』と題し、つらつらと執筆したものを公開する予定です。econgradおよびルーニーと繋がりがある皆さん、ご笑覧ください。

執筆時間15分（トイレ休憩の5分を含む）

f:id:econgrad:20210111131025j:plain — 昔、神戸三宮にあるとある喫茶点で食べたプリンとトースト（写真右上）です

econgrad.hatenablog.com

2020-10-13

社会科学のためのデータ分析入門章末問題解答(4章-1) Rコード

社会科学のためのデータ分析入門統計学・計量経済学 R

これまでの章の解答はこちら

章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。

欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません（2018年冬ごろの時点では）。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.

また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。

2章-1 (Chapter2 - Section 1)

スクリプトをベタ張りしています。

## Chapter 4 Prediction
## Exercise Solution

setwd("~/qss/PREDICTION") # ご自身のディレクトリを選択

## -----------------------------------------------------
## The author of this script uses Japanese-Version QSS.
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## Section 1
## Q1

intrade08 <- read.csv("intrade08.csv") 
intrade12 <- read.csv("intrade12.csv")
pres08 <- read.csv("pres08.csv") # Election results
pres12 <- read.csv("pres12.csv")

head(intrade08); head(intrade12)
head(pres08); head(pres12) 
dim(intrade08); dim(intrade12)

## 2008

## calculate Obama's mergin
pres08$margin <- pres08$Obama - pres08$McCain # result

## calculate D-Paety's mergin
intrade08$margin <- intrade08$PriceD - intrade08$PriceR

## convert to a Date object
class(intrade08$day)
intrade08$day <- as.Date(intrade08$day)
class(intrade08$day) # object is changed to "date"

unique(intrade08$day)
(unique(intrade08$state))

poll.pred08 <- rep(NA, 51)
poll.pred08 # this is just an empty vector
## extract unique state names which the loop will iterate through 
st.names <- unique(intrade08$state)

## add state names as labels for easy interpretation later on
names(poll.pred08) <- as.character(st.names) # element name is added 

## loop across 50 states plus DC
for (i in 1:51){
  ## subset the 2008 data
  before <- subset(intrade08, intrade08$day == "2008-11-03")
  before <- subset(before, subset = (state == st.names[i]))
  poll.pred08[i] <- before$margin
}

## predicted winners
poll.pred08

## which state prediction called wrong? 
sign(poll.pred08) != sign(pres08$margin)
pres08$state[sign(poll.pred08) != sign(pres08$margin)]
## what was the actual margin for these states? 
pres08$margin[sign(poll.pred08) != sign(pres08$margin)]

## prediction using poll is more accurate than using market price of gambling

## 2012
## convert to a Date object
intrade12$day <- as.Date(intrade12$day)

## calculate Obama's mergin
pres12$margin <- pres12$Obama - pres12$Romney # result

## calculate D-Paety's mergin
intrade12$margin <- intrade12$PriceD - intrade12$PriceR

poll.pred12 <- rep(NA, 50)

## extract unique state names which the loop will iterate through 
st.names <- unique(intrade12$state)

length(unique(intrade12$state))

## add state names as labels for easy interpretation later on
names(poll.pred12) <- as.character(st.names) # element name is added 

length(poll.pred12)

## loop across 50 states 
for (i in 1:50){
  ## subset the 2012 data
  before <- subset(intrade12, intrade12$day == "2012-11-05")
  before <- subset(before, subset = (state == st.names[i]))
  poll.pred12[i] <- before$margin
}

## predicted winners
poll.pred12

## replace NA in predicted winners by 1
poll.pred12 <- replace(poll.pred12, which(is.na(poll.pred12)), 1)

## because their lengths are different -
## unique(intrade12$state) does not contain DC - 
## we need to omit DC from pres12$margin in order to arrange the lengths
length(pres12$margin); length(poll.pred12); length(unique(intrade12$state))
names(poll.pred08) %in% names(poll.pred12) # find place where state is different
poll.pred08[8]; poll.pred12[8]
poll.pred08[9]; poll.pred12[9] # 9th is DC in intrade12$state

## which state prediction called wrong? 
sign(poll.pred12) != sign(pres12$margin[-9])
pres12$state[sign(poll.pred12) != sign(pres12$margin[-9])]
## what was the actual margin for these states? 
pres12$margin[sign(poll.pred12) != sign(pres12$margin[-9])]

## prediction using poll is still more accurate 
## than using market price of gambling. however, compared to Q1, 
## gambling in 2012 gives better prediction (due to lack of competition)

章末問題解答(1章-1) Rコード
 https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-1www.econ-stat-grad.com
章末問題解答(1章-2) Rコード https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-2www.econ-stat-grad.com

f:id:econgrad:20201012235846p:plain

2020-10-13

社会科学のためのデータ分析入門章末問題解答(3章-3) Rコード

社会科学のためのデータ分析入門統計学・計量経済学 R

これまでの章の解答はこちら

章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。

2章-1 (Chapter2 - Section 1)

スクリプトをベタ張りしています。

## Chapter 3 Measurement
## Exercise Solution

setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択

## -----------------------------------------------------
## The author of this script uses Japanese-Version QSS.
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## Section 3
## Q1

un <- read.csv("unvoting.csv")
unique(un$Year)

## in the year 1980 & 2000
un80 <- un[c(un$Year == "1980"), ]
un00 <- un[c(un$Year == "2000"), ]
dim(un80); dim(un00)

## plot histgrams
hist(un80$idealpoint, col = "#0000ff40", border = "#0000ff")
abline(v = median(un80$idealpoint), lty = "dashed", col = "red")
hist(un00$idealpoint, col = "#ff00ff40", border = "#ff00ff")
abline(v = median(un00$idealpoint), lty = "dashed", col = "blue")

## Extra work: plotting hists in one picture (using "add = TRUE").
hist(un80$idealpoint, col = "#0000ff40", border = "#0000ff",
     xlab = "", main = "Estimated Ideal Point in 1980", 
     xlim = c(-3, 3), ylim = c(0, 60))
abline(v = median(un80$idealpoint), lty = "dashed", col = "red")
text(0.5, 50, "median(-0.09)", col = "red")

hist(un00$idealpoint, col = "#ff00ff40", border = "#ff00ff", 
     add = TRUE)
abline(v = median(un00$idealpoint), lty = "dashed", col = "blue")
text(-0.5, 60, "median(-0.35)", col = "blue")

cols <- c("#0000ff40", "#ff00ff40")
legend("topright", c("1980", "2000"), col = cols, pch = 19)

## calculate quantiles
quantile(un80$idealpoint)
quantile(un00$idealpoint)


## Q2

## time-series plot
trans.us <- tapply(un$PctAgreeUS, un$Year, mean)
trans.russia <- tapply(un$PctAgreeRUSSIA, un$Year, mean)

length(trans.mean)
plot(names(trans.us), trans.us, type = "l", col = "deepskyblue", 
     ylim = c(0, 0.9), xlab = "Year", ylab = "", 
     main = "Transition: Mean Percentage of Agree with US & Russia")
text(2010, 0.1, "US", col = "deepskyblue")
lines(names(trans.russia), trans.russia, type = "l", col = "pink")
text(2010, 0.6, "Russia", col = "pink")

## Find the countries where support the US & Russia the most. 
usMost <- tapply(un$PctAgreeUS, un$CountryName, mean)
russiaMost <- tapply(un$PctAgreeRUSSIA, un$CountryName, mean)

## [2] finds the socond highest element
## We can also get result w/o [2]
sort(usMost, decreasing = TRUE)[2]
sort(russiaMost, decreasing = TRUE)[2]

## FYI: Command which.max gets the place of maximum number.
## However now we need to find the socond best because US and Russia
## have obviously concordance rate 1. 
which.max(russiaMost) # length 143
russiaMost[143] # finds 143th element


## Q3

## Create subsets of US & Russia.
usIde <- subset(un, un$CountryName == "United States of America")
russiaIde <- subset(un, un$CountryName == "Russia")

## Create subsets.
usIde <- usIde[, c("Year", "idealpoint")]
russiaIde <- russiaIde[, c("Year", "idealpoint")]
medianIde <- tapply(un$idealpoint, un$Year, median)

## Plot transition. 
plot(usIde, type = "l", ylim = c(-3, 3), col = "red")
lines(russiaIde, type = "l", col = "blue")
lines(names(medianIde), medianIde, lty = "dashed")
text(2010, 3, "US", col = "red")
text(2010, 0.6, "Russia", col = "blue")
text(2010, -1, "UN-Median")


## Q4

## Soviet Union subset
soviet <- subset(un, subset = (CountryName == "Estonia") | 
         (CountryName == "Latvia") | 
         (CountryName == "Lithuania") | 
         (CountryName == "Belarus") | 
         (CountryName == "Moldova") | 
         (CountryName == "Ukraine") | 
         (CountryName == "Armenia") | 
         (CountryName == "Azerbaijan") | 
         (CountryName == "Georgia") | 
         (CountryName == "Kazakhstan") |
         (CountryName == "Kyrgyzstan") |
         (CountryName == "Tajikistan") |
         (CountryName == "Turkmenistan") |
         (CountryName == "Uzbekistan") |
         (CountryName == "Russia") 
)

## 2012 data
un2012 <- un[c(un$Year == "2012"), ]
soviet2012 <- soviet[c(soviet$Year == "2012"), ]

a <- un2012$CountryName %in% soviet2012$CountryName

## Soviet Union dummy
un2012$sov <- NA
un2012$sov <- ifelse((un2012$CountryName %in% soviet2012$CountryName) == TRUE, 1, 0)

un2012a <- subset(un2012, un2012$sov == 0)
un2012a <- un2012a[, c("idealpoint", "PctAgreeUS")]
soviet2012a <- soviet2012[, c("idealpoint", "PctAgreeUS")]

## Plot 2 groups in 1 picture. 
plot(0, 0, type = "n", 
     xlim = c(min(un2012a$idealpoint), max(un2012a$idealpoint)), 
     ylim = c(min(un2012a$PctAgreeUS), max(un2012a$PctAgreeUS)), 
     xlab = "idealpoint", ylab = "percentage agree with US", 
     main = "Soviet & non-Soviet in 2012")

points(un2012a, pch = 22, col = "blue")
points(soviet2012a, pch = 20, col = "red")

## same way to plot the points. 
points(un2012a$idealpoint, un2012a$PctAgreeUS, pch = 22, col = "deepskyblue") 
points(soviet2012a$idealpoint, soviet2012a$PctAgreeUS, pch = 20, col = "red") 

## There exists storong correlation b/w idealpoint & PctAgreeUS. 
## Tendency seems to be the same b/w Soviet and non-Soviet countries. 


## Q5

## Create non-Soviet subset
un$sov <- NA
un$sov <- ifelse((un$CountryName %in% soviet$CountryName) == TRUE, 1, 0)

un.sov <- subset(un, un$sov == 0)

trans.sov.m <- tapply(soviet$idealpoint, soviet$Year, median)
trans.un.m <- tapply(un.sov$idealpoint, un.sov$Year, median)

## plot lines
plot(names(trans.sov.m), trans.sov.m, type = "l", 
     xlab = "", ylab = "median ideal point", col = "red",
     ylim = c(-3, 1))
lines(names(trans.un.m), trans.un.m, type = "l", col = "blue")
abline(v = 1989, lty = "dashed") # Fall of the Berlin Wall
text(2007, 0.5, "Soviet", col = "red")
text(2000, -0.2, "non-Soviet", col = "blue")
text(1985, 1, "Fall of the Berlin Wall")

f:id:econgrad:20201012235846p:plain

2020-10-13

社会科学のためのデータ分析入門章末問題解答(3章-2) Rコード

社会科学のためのデータ分析入門統計学・計量経済学 R

これまでの章の解答はこちら

章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。

2章-1 (Chapter2 - Section 1)

スクリプトをベタ張りしています。

## Chapter 3 Measurement
## Exercise Solution

setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択

## -----------------------------------------------------
## The author of this script uses Japanese-Version QSS.
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## Section 2
## Q1

vign <- read.csv("vignettes.csv")
head(vign); dim(vign)

china <- subset(vign, vign$china == 1)
mexico <- subset(vign, vign$china != 1)

Cself <- prop.table(table(china$self))
Mself <- prop.table(table(mexico$self))

## Plot barplot of self-evalutaion in China & Mexico.
barplot(Cself, xlab = "Score of Self Evaluation in China", ylab = "ratio")
barplot(Mself, xlab = "Score of Self Evaluation in Mexico", ylab = "ratio")

## Calculate the means. 
Cself; Mself
mean(Cself); mean(Mself) 
## mean political effectiveness is the same, 
## but the shapes of distributions are totally different. 
## The results seem not to be consistent with their political conditions. 


## Q2

hist(china$age, ylim = c(0, 80))
abline(v = median(china$age), lty = "dashed", col = "red")
text(x = 50, y = 60, "median = 45")

hist(mexico$age, ylim = c(0, 80))
abline(v = median(mexico$age), lty = "dashed", col = "red")
text(x = 40, y = 80, "median = 35")

## Extra work: plotting hists in one picture (using "add = TRUE").

hist(china$age, ylim = c(0, 80), xlab = "age", main = "hist of age",
     col = "#0000ff40", border = "#0000ff")
abline(v = median(china$age), lty = "dashed", col = "red")
text(x = 50, y = 60, "median = 45")

hist(mexico$age, ylim = c(0, 80), col = "#ff00ff40", border = "#ff00ff", 
     add = TRUE)
abline(v = median(mexico$age), lty = "dashed", col = "red")
text(x = 40, y = 80, "median = 35")
cols <- c("#0000ff40", "#ff00ff40")
legend("topright", c("China", "Mexico"), col = cols, pch = 19)

## End the extra work. 

## Plot the Q-Q plots.
## China's Q-Q plot
qqplot(china$age, mexico$age)
abline(0, 1)

## First, the Q-Q plot is not on the 45-degree line, 
## which means thier age's distributions are not similar.
## Q-Q line is below the 45-degree line, and thus 
## Mexico's distribution is not that scattered like China counterpart. 


## Q3

## Percentage of respondents who give low self-evaluation
## less than eveluation about moses

## Create dummy variables in China & Mexico. 
Cmo.se <- sum(ifelse(china$moses > china$self, 1, 0))
Cmo.se.l <- length(ifelse(china$moses > china$self, 1, 0)) # no need to use ifelse

Mmo.se <- sum(ifelse(mexico$moses > mexico$self, 1, 0))
Mmo.se.l <- length(ifelse(mexico$moses > mexico$self, 1, 0))

## Answers
Cmo.se / Cmo.se.l # China
Mmo.se / Mmo.se.l # Mexico


## Q4

## Create subsets which have relationship alison > jane > moses. 
china3 <- subset(china, subset = (alison >= jane) & (jane >= moses))
mexico3 <- subset(mexico, subset = (alison >= jane) & (jane >= moses))
vign3 <- subset(vign, subset = (alison >= jane) & (jane >= moses))

nrow(mexico)
nrow(china3)

## Create categorical variables in China & Mexico, and overall. 
## overall
vign3$self.pos <- NA

vign3$self.pos[vign3$moses > vign3$self] <- 1

vign3$self.pos[(vign3$moses == vign3$self) | 
                  ((vign3$moses <= vign3$self) & 
                     (vign3$self < vign3$jane))] <- 2

vign3$self.pos[(vign3$jane == vign3$self) | 
                  ((vign3$jane <= vign3$self) & 
                     (vign3$self < vign3$alison))] <- 3

vign3$self.pos[(vign3$alison <= vign3$self)] <- 4

## China 
china3$self.pos <- NA

china3$self.pos[china3$moses > china3$self] <- 1

china3$self.pos[(china3$moses == china3$self) | 
                  ((china3$moses <= china3$self) & 
                     (china3$self < china3$jane))] <- 2

china3$self.pos[(china3$jane == china3$self) | 
                  ((china3$jane <= china3$self) & 
                     (china3$self < china3$alison))] <- 3

china3$self.pos[(china3$alison <= china3$self)] <- 4

china3$self.pos # confirm

## Mexico

mexico3$self.pop <- NA

mexico3$self.pop[mexico3$moses > mexico3$self] <- 1

mexico3$self.pop[(mexico3$moses == mexico3$self) | 
                   ((mexico3$self >= mexico3$moses) & 
                      (mexico3$jane > mexico3$self))] <- 2

mexico3$self.pop[(mexico3$jane == mexico3$self) | 
                   ((mexico3$self >= mexico3$jane) & 
                      (mexico3$alison > mexico3$self))] <- 3

mexico3$self.pop[(mexico3$self >= mexico3$alison)] <- 4

## bar plot
Vself.pop <- prop.table(table(vign3$self.pos))
Cself.pop <- prop.table(table(china3$self.pos))
Mself.pop <- prop.table(table(mexico3$self.pop))

barplot(Vself.pop)
barplot(Cself.pop)
barplot(Mself.pop)

## mean

mean(Vself.pop); mean(Cself.pop); mean(Mself.pop)


## Q5

## under & over 40
Vu40 <- subset(vign3, vign3$age < 40)
Vo40 <- subset(vign3, vign3$age >= 40)

Cu40 <- subset(china3, china3$age < 40)
Co40 <- subset(china3, china3$age >= 40)

Mu40 <- subset(mexico3, mexico3$age < 40)
Mo40 <- subset(mexico3, mexico3$age >= 40)

## proportion tables
Vu40.pop <- prop.table(table(Vu40$self.pos))
Vo40.pop <- prop.table(table(Vo40$self.pos))

Cu40.pop <- prop.table(table(Cu40$self.pos))
Co40.pop <- prop.table(table(Co40$self.pos))

Mu40.pop <- prop.table(table(Mu40$self.pos))
Mo40.pop <- prop.table(table(Mo40$self.pos))

## barplot(Vu40.pop); barplot(Vo40.pop)
## barplot(Cu40.pop); barplot(Co40.pop)
## barplot(Mu40.pop); barplot(Mo40.pop)

## mean
mean(Vu40.pop); mean(Vo40.pop)
mean(Cu40.pop); mean(Co40.pop)
mean(Mu40.pop); mean(Mo40.pop)

f:id:econgrad:20201012235846p:plain

2020-10-13

社会科学のためのデータ分析入門章末問題解答(3章-1) Rコード

社会科学のためのデータ分析入門統計学・計量経済学 R

これまでの章の解答はこちら

章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。

2章-1 (Chapter2 - Section 1)

スクリプトをベタ張りしています。

## Chapter 3 Measurement
## Exercise Solution

setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択

## -----------------------------------------------------
## The author of this script uses Japanese-Version QSS.
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## Section 1
## Q1

reshaped <- read.csv("gayreshaped.csv")
ccap <- read.csv("ccap2012.csv")

head(reshaped)
head(ccap)

## "complete.obs" perform a calculation for rows which have perfect matching. 
## Correlation is strong b/w the two. 
cor(reshaped$therm1, reshaped$therm2, use = "complete.obs")


## Q2

## Create the subset
study2 <- subset(reshaped, subset = (study == 2) & (treatment == "No Contact"))
study2a <- study2[, 3:6]
class(study2a)

## "pairwise.complete.obs" perform a calculation for a data.frame which
## has several variables & missing values. 
cor(study2a, use = "pairwise.complete.obs") # correlation coefficient matrix

## The result is slightly different form the argument "pairwise.complete.obs". 
## See. http://keita43a.hatenablog.com/entry/2018/04/10/034230. 
cor(study2a, use = "complete.obs")

## All correlation coefficients are nealy 1 (means storong correlation).

## Extra exercise, ploting the matrix using packages psych & corrplot
## install.packages("psych")
library(psych)
psych::pairs.panels(study2a)
psych::pairs.panels(study2a, 
                    hist.col = "white", rug = F, ellipses = F, lm = T)

## install.packages("corrplot")
library(corrplot)
corrplot::corrplot(cor(study2a, use = "complete.obs"))

cor.plot(cor(study2a, use = "complete.obs"), numbers=T)


## Q3

## You might encounter "Error in plot.new() : figure margins too large"
## use dev.new() & dev.off() before and after the command plot

plot(study2a$therm1, study2a$therm2, pch = 20,
     col = "blue", xlab = "Therm 1", ylab = "Therm 2") 

plot(study2a$therm1, study2a$therm3, pch = 20,
     col = "red", xlab = "Therm 1", ylab = "Therm 3") 

plot(study2a$therm1, study2a$therm4, pch = 20,
     col = "green", xlab = "Therm 1", ylab = "Therm 4") 

## Plot & save 3 scatter plots in 1 screan (a bit weird though).
png("aaa.png")
split.screen(figs = c(1, 2))
split.screen(figs = c(2, 1), screen = 2)

screen(1)
plot(study2a$therm1, study2a$therm2, pch = 20,
     col = "blue", xlab = "Therm 1", ylab = "Therm 2") 
screen(3)
plot(study2a$therm1, study2a$therm3, pch = 20,
     col = "red", xlab = "Therm 1", ylab = "Therm 3") 
screen(4)
plot(study2a$therm1, study2a$therm4, pch = 20,
     col = "green", xlab = "Therm 1", ylab = "Therm 4") 
dev.off()

## Get back to 1 screan format. 
## close.screen(all = T)

## Another way to plot. 
## Plot 3 groups in 1 picture. 

plot(0, 0, type = "n", 
     xlim = c(0, max(study2a$therm1)), 
     ylim = c(0, max(study2a$therm1)), 
     xlab = "therm1", ylab = "therm2-4")

points(study2a$therm1, study2a$therm2, pch = 20, col = "red") 
points(study2a$therm1, study2a$therm3, pch = 20, col = "blue") 
points(study2a$therm1, study2a$therm4, pch = 20, col = "green") 

plot(0, 0, type = "n", 
     xlim = c(0, max(study2a$therm1)), 
     ylim = c(0, max(study2a$therm1)), 
     xlab = "therm1", ylab = "therm2-4")

points(study2a$therm1, study2a$therm2, pch = 20, col = "#30303020") 
points(study2a$therm1, study2a$therm3, pch = 20, col = "#ff990020") 
points(study2a$therm1, study2a$therm4, pch = 20, col = "#ffb6c120") 


## Q4

## Plot CCAP's gaytherm
hist(ccap$gaytherm, col = "#0000ff20", las = 1, freq = FALSE, 
     xlab = "therm", main = "Histogram of GayTherm", breaks = 20)

## Plot some lines
abline(v = mean(ccap$gaytherm, na.rm = TRUE), 
       lty = "dashed", col = "red")
text(x = 67, y = 0.030, "mean(58.7)", col = "red")
lines(density(ccap$gaytherm, na.rm = TRUE), col = "orange", lwd = 2)
mean(ccap$gaytherm, na.rm = TRUE)

## Plot study1's gaytherm

head(reshaped)
study1 <- subset(reshaped, reshaped$study == 1)

hist(study1$therm1, col = "#0000ff20", las = 1, freq = FALSE, 
     xlab = "therm", main = "Histogram of Study1's Therm1", breaks = 20)

hist(study2$therm1, col = "#0000ff20", las = 1, freq = FALSE, 
     xlab = "therm", main = "Histogram of Study2's Therm1", breaks = 20)


## Q5

qqplot(study2a$therm1, study2a$therm2, pch = 20, col = "#0000ff") 
abline(0, 1) # 45-degree line
qqplot(study2a$therm1, study2a$therm3, pch = 20, col = "#0000ff") 
abline(0, 1)
qqplot(study2a$therm1, study2a$therm4, pch = 20, col = "#0000ff") 
abline(0, 1)

## It is difficult to put a interpretation. 
## Q-Q plot is almost on the 45-degree line. 
## This means two variables' distributions are almost the same.

f:id:econgrad:20201012235846p:plain

2020-10-13

社会科学のためのデータ分析入門章末問題解答(2章-3) Rコード

社会科学のためのデータ分析入門統計学・計量経済学 R

これまでの章の解答はこちら

章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。

2章-1 (Chapter2 - Section 1)

スクリプトをベタ張りしています。

## Chapter 2 Causality 
## Exercise Solution

setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択

## -----------------------------------------------------
## The author of this script uses Japanese-Version QSS.
## -----------------------------------------------------
## -----------------------------------------------------
## -----------------------------------------------------
## Section 3
## Q1

leader <- read.csv("leaders.csv")

dim(leader)
summary(leader$result)

length(unique(leader$country)) # 88 countries
y <- nrow(leader) # 250 countries in total
z <- length(unique(leader$year)) # n of plots w/o overlaps
y / z # year mean of plots in those 88 countries


## Q2

unique(leader$result)

## Create variable success in the original data frame.
## We use the logical disjunction (|) here 
## because the way to die does not matter. 

leader$success <- NA 
leader$success[leader$result == "dies between a day and a week" 
               | leader$result == "dies between a week and a month" 
               | leader$result == "dies within a day after the attack"
               | leader$result == "dies, timing unknown"
               ] <- 1

leader$success[leader$result == "hospitalization but no permanent disability" 
               | leader$result == "not wounded"
               | leader$result == "plot stopped"
               | leader$result == "survives but wounded severely"
               | leader$result == "survives, whether wounded unknown"
               | leader$result == "wounded lightly"
               ] <- 0

## leader <- leader[, -"sucess"] # Delete wrong variable name

mean(leader$success) # 21.6%

## Success of assassination attempt seems not to be done randomly. 
## Thus, the assumption is not valid. 


## Q3

tapply(leader$politybefore, leader$success, mean)
## The difference b/w the two is about 1.04, seems not to be trivial. 

tapply(leader$age, leader$success, mean)
## Aimed leader among sccuees is 3 years higher than among failure. 


## Q4

## Create a variable warbefore
leader$warbefore <- NA

leader$warbefore[leader$civilwarbefore == 1 
                 | leader$interwarbefore == 1] <- 1

## Notice that we have to use the logical cunjunction (&). 
leader$warbefore[leader$civilwarbefore != 1 
                 & leader$interwarbefore != 1] <- 0

summary(leader$warbefore)

war3b <- subset(leader, leader$warbefore == 1)

tapply(war3b$politybefore, war3b$success, mean)
tapply(war3b$age, war3b$success, mean)

## Among the countries where experience war have higher polity score. 
## Among them, age of ploted leaders are higher among success, diff. is trivial though. 


## Q5

## Create a variable warbefore
leader$warafter <- NA

leader$warafter[leader$civilwarafter == 1
                | leader$interwarafter == 1] <- 1

leader$warafter[leader$civilwarafter != 1
                & leader$interwarafter != 1] <- 0

war3a <- subset(leader, subset = (warafter == 1))

tapply(leader$warafter, leader$success, mean)
tapply(leader$polityafter, leader$success, mean)

## After success the rate of breaking out war is about 20%, 
## while its rate after failure is 30%.

f:id:econgrad:20201012235846p:plain

開発経済学とその他応用分野を学ぶ院生

人間万事塞翁が馬を大切にしている応用経済学徒. 2020年4月から開発・計量・プログラミング関連の記事を書きます.

GitHub Pagesで個人Websiteを作成した

22歳を過ぎたあたりから読書について注意していること

Covid-19は自分にどのような影響を与えたか：大学院生の振り返り（短文）

社会科学のためのデータ分析入門章末問題解答(4章-1) Rコード

これまでの章の解答はこちら

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

2章-1 (Chapter2 - Section 1)

社会科学のためのデータ分析入門章末問題解答(3章-3) Rコード

これまでの章の解答はこちら

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

2章-1 (Chapter2 - Section 1)

社会科学のためのデータ分析入門章末問題解答(3章-2) Rコード

これまでの章の解答はこちら

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

2章-1 (Chapter2 - Section 1)

社会科学のためのデータ分析入門章末問題解答(3章-1) Rコード

これまでの章の解答はこちら

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

2章-1 (Chapter2 - Section 1)

社会科学のためのデータ分析入門章末問題解答(2章-3) Rコード

これまでの章の解答はこちら

はじめに (Textbook Solution: Quantitative Social Science: An Introduction )

2章-1 (Chapter2 - Section 1)