22歳を過ぎたあたりから読書について注意していること
先日、こんな記事を書いた。
半年前から、コロナがなければ2020年をどう過ごし、今何をしているか、みたいなことを考えていた訳で、この記事を書いた。
どこまで自分について書くのか、プライベートはことはあまり書いてこなかったもんで、長文編の構成に悩んでいる。
この題については置いといて、22歳23歳ごろ(つまり大学院に入学したころ)から感じ、年々強く感じるようになっているのが、「無駄な読書」についてだ。
学術的知識および能力をひたすら磨きたいので、いわゆる趣味の読書はやめてそっちに全振りしたいが、読みたい本も多い。そして、そういう読書から学ぶことが人生や研究に生きることもある。
そのため、本は読むことにしているが、無駄は本は出来るだけ読まないように、本選びには非常に慎重になった。また、完全に趣味としかならない(と思われる)小説を読むことについても慎重になってしまった。
Covid-19は自分にどのような影響を与えたか:大学院生の振り返り(短文)
最後に更新をしてから、随分と時間が経っておりました。しかも、最後の更新は数年前に公開しようと思っていたものを、コピペしてそのまま更新したものです。
我々は、2020年を生き抜き、 本記事を読んでいる人は、2021年を生きています。程度の差はあっても、誰もがCovid-19の影響を受けたと思います。Covidがなければどのような2020年を過ごし、現在何を考え何をしていただろう(counterfactual、知ることが出来ない事実)と考えることもあります。このようなことを考えるのは、Covidによる環境の変化が、自分の生活や意思決定、およびキャリアなどに大きな影響を与えたからであると思います。では、実際にどのような変化を強いられたのか、自分が選んだのか、それは自分にとって良いことなのかそうでないのか。今後について考える上で、自分という人間がBefore Covidとはどう変わったのか、考える必要があるかと思います。時間を設け、これらを洗い出し、文章にすることで、出来るだけ全てを整理したいと思います。It's not an idea until you write it down という言葉がありますが(Ivan Sutherland教授の言葉)、idea(アイデア・着想)だけでなくthought(思考)についてもある程度当てはまると思います。文字にすることで、整理されることがある。
次回の投稿では、『Covid-19は自分にどのような影響を与えたか:大学院生の振り返り(長文)』と題し、つらつらと執筆したものを公開する予定です。econgradおよびルーニーと繋がりがある皆さん、ご笑覧ください。
執筆時間15分(トイレ休憩の5分を含む)
社会科学のためのデータ分析入門 章末問題解答(4章-1) Rコード
これまでの章の解答はこちら
章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。
はじめに (Textbook Solution: Quantitative Social Science: An Introduction )
Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。
欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません(2018年冬ごろの時点では)。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.
また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。
2章-1 (Chapter2 - Section 1)
スクリプトをベタ張りしています。
## Chapter 4 Prediction ## Exercise Solution setwd("~/qss/PREDICTION") # ご自身のディレクトリを選択 ## ----------------------------------------------------- ## The author of this script uses Japanese-Version QSS. ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## Section 1 ## Q1 intrade08 <- read.csv("intrade08.csv") intrade12 <- read.csv("intrade12.csv") pres08 <- read.csv("pres08.csv") # Election results pres12 <- read.csv("pres12.csv") head(intrade08); head(intrade12) head(pres08); head(pres12) dim(intrade08); dim(intrade12) ## 2008 ## calculate Obama's mergin pres08$margin <- pres08$Obama - pres08$McCain # result ## calculate D-Paety's mergin intrade08$margin <- intrade08$PriceD - intrade08$PriceR ## convert to a Date object class(intrade08$day) intrade08$day <- as.Date(intrade08$day) class(intrade08$day) # object is changed to "date" unique(intrade08$day) (unique(intrade08$state)) poll.pred08 <- rep(NA, 51) poll.pred08 # this is just an empty vector ## extract unique state names which the loop will iterate through st.names <- unique(intrade08$state) ## add state names as labels for easy interpretation later on names(poll.pred08) <- as.character(st.names) # element name is added ## loop across 50 states plus DC for (i in 1:51){ ## subset the 2008 data before <- subset(intrade08, intrade08$day == "2008-11-03") before <- subset(before, subset = (state == st.names[i])) poll.pred08[i] <- before$margin } ## predicted winners poll.pred08 ## which state prediction called wrong? sign(poll.pred08) != sign(pres08$margin) pres08$state[sign(poll.pred08) != sign(pres08$margin)] ## what was the actual margin for these states? pres08$margin[sign(poll.pred08) != sign(pres08$margin)] ## prediction using poll is more accurate than using market price of gambling ## 2012 ## convert to a Date object intrade12$day <- as.Date(intrade12$day) ## calculate Obama's mergin pres12$margin <- pres12$Obama - pres12$Romney # result ## calculate D-Paety's mergin intrade12$margin <- intrade12$PriceD - intrade12$PriceR poll.pred12 <- rep(NA, 50) ## extract unique state names which the loop will iterate through st.names <- unique(intrade12$state) length(unique(intrade12$state)) ## add state names as labels for easy interpretation later on names(poll.pred12) <- as.character(st.names) # element name is added length(poll.pred12) ## loop across 50 states for (i in 1:50){ ## subset the 2012 data before <- subset(intrade12, intrade12$day == "2012-11-05") before <- subset(before, subset = (state == st.names[i])) poll.pred12[i] <- before$margin } ## predicted winners poll.pred12 ## replace NA in predicted winners by 1 poll.pred12 <- replace(poll.pred12, which(is.na(poll.pred12)), 1) ## because their lengths are different - ## unique(intrade12$state) does not contain DC - ## we need to omit DC from pres12$margin in order to arrange the lengths length(pres12$margin); length(poll.pred12); length(unique(intrade12$state)) names(poll.pred08) %in% names(poll.pred12) # find place where state is different poll.pred08[8]; poll.pred12[8] poll.pred08[9]; poll.pred12[9] # 9th is DC in intrade12$state ## which state prediction called wrong? sign(poll.pred12) != sign(pres12$margin[-9]) pres12$state[sign(poll.pred12) != sign(pres12$margin[-9])] ## what was the actual margin for these states? pres12$margin[sign(poll.pred12) != sign(pres12$margin[-9])] ## prediction using poll is still more accurate ## than using market price of gambling. however, compared to Q1, ## gambling in 2012 gives better prediction (due to lack of competition)
章末問題解答(1章-1) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-1www.econ-stat-grad.com
章末問題解答(1章-2) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-2www.econ-stat-grad.com
社会科学のためのデータ分析入門 章末問題解答(3章-3) Rコード
これまでの章の解答はこちら
章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。
はじめに (Textbook Solution: Quantitative Social Science: An Introduction )
Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。
欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません(2018年冬ごろの時点では)。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.
また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。
2章-1 (Chapter2 - Section 1)
スクリプトをベタ張りしています。
## Chapter 3 Measurement ## Exercise Solution setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択 ## ----------------------------------------------------- ## The author of this script uses Japanese-Version QSS. ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## Section 3 ## Q1 un <- read.csv("unvoting.csv") unique(un$Year) ## in the year 1980 & 2000 un80 <- un[c(un$Year == "1980"), ] un00 <- un[c(un$Year == "2000"), ] dim(un80); dim(un00) ## plot histgrams hist(un80$idealpoint, col = "#0000ff40", border = "#0000ff") abline(v = median(un80$idealpoint), lty = "dashed", col = "red") hist(un00$idealpoint, col = "#ff00ff40", border = "#ff00ff") abline(v = median(un00$idealpoint), lty = "dashed", col = "blue") ## Extra work: plotting hists in one picture (using "add = TRUE"). hist(un80$idealpoint, col = "#0000ff40", border = "#0000ff", xlab = "", main = "Estimated Ideal Point in 1980", xlim = c(-3, 3), ylim = c(0, 60)) abline(v = median(un80$idealpoint), lty = "dashed", col = "red") text(0.5, 50, "median(-0.09)", col = "red") hist(un00$idealpoint, col = "#ff00ff40", border = "#ff00ff", add = TRUE) abline(v = median(un00$idealpoint), lty = "dashed", col = "blue") text(-0.5, 60, "median(-0.35)", col = "blue") cols <- c("#0000ff40", "#ff00ff40") legend("topright", c("1980", "2000"), col = cols, pch = 19) ## calculate quantiles quantile(un80$idealpoint) quantile(un00$idealpoint) ## Q2 ## time-series plot trans.us <- tapply(un$PctAgreeUS, un$Year, mean) trans.russia <- tapply(un$PctAgreeRUSSIA, un$Year, mean) length(trans.mean) plot(names(trans.us), trans.us, type = "l", col = "deepskyblue", ylim = c(0, 0.9), xlab = "Year", ylab = "", main = "Transition: Mean Percentage of Agree with US & Russia") text(2010, 0.1, "US", col = "deepskyblue") lines(names(trans.russia), trans.russia, type = "l", col = "pink") text(2010, 0.6, "Russia", col = "pink") ## Find the countries where support the US & Russia the most. usMost <- tapply(un$PctAgreeUS, un$CountryName, mean) russiaMost <- tapply(un$PctAgreeRUSSIA, un$CountryName, mean) ## [2] finds the socond highest element ## We can also get result w/o [2] sort(usMost, decreasing = TRUE)[2] sort(russiaMost, decreasing = TRUE)[2] ## FYI: Command which.max gets the place of maximum number. ## However now we need to find the socond best because US and Russia ## have obviously concordance rate 1. which.max(russiaMost) # length 143 russiaMost[143] # finds 143th element ## Q3 ## Create subsets of US & Russia. usIde <- subset(un, un$CountryName == "United States of America") russiaIde <- subset(un, un$CountryName == "Russia") ## Create subsets. usIde <- usIde[, c("Year", "idealpoint")] russiaIde <- russiaIde[, c("Year", "idealpoint")] medianIde <- tapply(un$idealpoint, un$Year, median) ## Plot transition. plot(usIde, type = "l", ylim = c(-3, 3), col = "red") lines(russiaIde, type = "l", col = "blue") lines(names(medianIde), medianIde, lty = "dashed") text(2010, 3, "US", col = "red") text(2010, 0.6, "Russia", col = "blue") text(2010, -1, "UN-Median") ## Q4 ## Soviet Union subset soviet <- subset(un, subset = (CountryName == "Estonia") | (CountryName == "Latvia") | (CountryName == "Lithuania") | (CountryName == "Belarus") | (CountryName == "Moldova") | (CountryName == "Ukraine") | (CountryName == "Armenia") | (CountryName == "Azerbaijan") | (CountryName == "Georgia") | (CountryName == "Kazakhstan") | (CountryName == "Kyrgyzstan") | (CountryName == "Tajikistan") | (CountryName == "Turkmenistan") | (CountryName == "Uzbekistan") | (CountryName == "Russia") ) ## 2012 data un2012 <- un[c(un$Year == "2012"), ] soviet2012 <- soviet[c(soviet$Year == "2012"), ] a <- un2012$CountryName %in% soviet2012$CountryName ## Soviet Union dummy un2012$sov <- NA un2012$sov <- ifelse((un2012$CountryName %in% soviet2012$CountryName) == TRUE, 1, 0) un2012a <- subset(un2012, un2012$sov == 0) un2012a <- un2012a[, c("idealpoint", "PctAgreeUS")] soviet2012a <- soviet2012[, c("idealpoint", "PctAgreeUS")] ## Plot 2 groups in 1 picture. plot(0, 0, type = "n", xlim = c(min(un2012a$idealpoint), max(un2012a$idealpoint)), ylim = c(min(un2012a$PctAgreeUS), max(un2012a$PctAgreeUS)), xlab = "idealpoint", ylab = "percentage agree with US", main = "Soviet & non-Soviet in 2012") points(un2012a, pch = 22, col = "blue") points(soviet2012a, pch = 20, col = "red") ## same way to plot the points. points(un2012a$idealpoint, un2012a$PctAgreeUS, pch = 22, col = "deepskyblue") points(soviet2012a$idealpoint, soviet2012a$PctAgreeUS, pch = 20, col = "red") ## There exists storong correlation b/w idealpoint & PctAgreeUS. ## Tendency seems to be the same b/w Soviet and non-Soviet countries. ## Q5 ## Create non-Soviet subset un$sov <- NA un$sov <- ifelse((un$CountryName %in% soviet$CountryName) == TRUE, 1, 0) un.sov <- subset(un, un$sov == 0) trans.sov.m <- tapply(soviet$idealpoint, soviet$Year, median) trans.un.m <- tapply(un.sov$idealpoint, un.sov$Year, median) ## plot lines plot(names(trans.sov.m), trans.sov.m, type = "l", xlab = "", ylab = "median ideal point", col = "red", ylim = c(-3, 1)) lines(names(trans.un.m), trans.un.m, type = "l", col = "blue") abline(v = 1989, lty = "dashed") # Fall of the Berlin Wall text(2007, 0.5, "Soviet", col = "red") text(2000, -0.2, "non-Soviet", col = "blue") text(1985, 1, "Fall of the Berlin Wall")
章末問題解答(1章-1) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-1www.econ-stat-grad.com
章末問題解答(1章-2) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-2www.econ-stat-grad.com
社会科学のためのデータ分析入門 章末問題解答(3章-2) Rコード
これまでの章の解答はこちら
章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。
はじめに (Textbook Solution: Quantitative Social Science: An Introduction )
Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。
欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません(2018年冬ごろの時点では)。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.
また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。
2章-1 (Chapter2 - Section 1)
スクリプトをベタ張りしています。
## Chapter 3 Measurement ## Exercise Solution setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択 ## ----------------------------------------------------- ## The author of this script uses Japanese-Version QSS. ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## Section 2 ## Q1 vign <- read.csv("vignettes.csv") head(vign); dim(vign) china <- subset(vign, vign$china == 1) mexico <- subset(vign, vign$china != 1) Cself <- prop.table(table(china$self)) Mself <- prop.table(table(mexico$self)) ## Plot barplot of self-evalutaion in China & Mexico. barplot(Cself, xlab = "Score of Self Evaluation in China", ylab = "ratio") barplot(Mself, xlab = "Score of Self Evaluation in Mexico", ylab = "ratio") ## Calculate the means. Cself; Mself mean(Cself); mean(Mself) ## mean political effectiveness is the same, ## but the shapes of distributions are totally different. ## The results seem not to be consistent with their political conditions. ## Q2 hist(china$age, ylim = c(0, 80)) abline(v = median(china$age), lty = "dashed", col = "red") text(x = 50, y = 60, "median = 45") hist(mexico$age, ylim = c(0, 80)) abline(v = median(mexico$age), lty = "dashed", col = "red") text(x = 40, y = 80, "median = 35") ## Extra work: plotting hists in one picture (using "add = TRUE"). hist(china$age, ylim = c(0, 80), xlab = "age", main = "hist of age", col = "#0000ff40", border = "#0000ff") abline(v = median(china$age), lty = "dashed", col = "red") text(x = 50, y = 60, "median = 45") hist(mexico$age, ylim = c(0, 80), col = "#ff00ff40", border = "#ff00ff", add = TRUE) abline(v = median(mexico$age), lty = "dashed", col = "red") text(x = 40, y = 80, "median = 35") cols <- c("#0000ff40", "#ff00ff40") legend("topright", c("China", "Mexico"), col = cols, pch = 19) ## End the extra work. ## Plot the Q-Q plots. ## China's Q-Q plot qqplot(china$age, mexico$age) abline(0, 1) ## First, the Q-Q plot is not on the 45-degree line, ## which means thier age's distributions are not similar. ## Q-Q line is below the 45-degree line, and thus ## Mexico's distribution is not that scattered like China counterpart. ## Q3 ## Percentage of respondents who give low self-evaluation ## less than eveluation about moses ## Create dummy variables in China & Mexico. Cmo.se <- sum(ifelse(china$moses > china$self, 1, 0)) Cmo.se.l <- length(ifelse(china$moses > china$self, 1, 0)) # no need to use ifelse Mmo.se <- sum(ifelse(mexico$moses > mexico$self, 1, 0)) Mmo.se.l <- length(ifelse(mexico$moses > mexico$self, 1, 0)) ## Answers Cmo.se / Cmo.se.l # China Mmo.se / Mmo.se.l # Mexico ## Q4 ## Create subsets which have relationship alison > jane > moses. china3 <- subset(china, subset = (alison >= jane) & (jane >= moses)) mexico3 <- subset(mexico, subset = (alison >= jane) & (jane >= moses)) vign3 <- subset(vign, subset = (alison >= jane) & (jane >= moses)) nrow(mexico) nrow(china3) ## Create categorical variables in China & Mexico, and overall. ## overall vign3$self.pos <- NA vign3$self.pos[vign3$moses > vign3$self] <- 1 vign3$self.pos[(vign3$moses == vign3$self) | ((vign3$moses <= vign3$self) & (vign3$self < vign3$jane))] <- 2 vign3$self.pos[(vign3$jane == vign3$self) | ((vign3$jane <= vign3$self) & (vign3$self < vign3$alison))] <- 3 vign3$self.pos[(vign3$alison <= vign3$self)] <- 4 ## China china3$self.pos <- NA china3$self.pos[china3$moses > china3$self] <- 1 china3$self.pos[(china3$moses == china3$self) | ((china3$moses <= china3$self) & (china3$self < china3$jane))] <- 2 china3$self.pos[(china3$jane == china3$self) | ((china3$jane <= china3$self) & (china3$self < china3$alison))] <- 3 china3$self.pos[(china3$alison <= china3$self)] <- 4 china3$self.pos # confirm ## Mexico mexico3$self.pop <- NA mexico3$self.pop[mexico3$moses > mexico3$self] <- 1 mexico3$self.pop[(mexico3$moses == mexico3$self) | ((mexico3$self >= mexico3$moses) & (mexico3$jane > mexico3$self))] <- 2 mexico3$self.pop[(mexico3$jane == mexico3$self) | ((mexico3$self >= mexico3$jane) & (mexico3$alison > mexico3$self))] <- 3 mexico3$self.pop[(mexico3$self >= mexico3$alison)] <- 4 ## bar plot Vself.pop <- prop.table(table(vign3$self.pos)) Cself.pop <- prop.table(table(china3$self.pos)) Mself.pop <- prop.table(table(mexico3$self.pop)) barplot(Vself.pop) barplot(Cself.pop) barplot(Mself.pop) ## mean mean(Vself.pop); mean(Cself.pop); mean(Mself.pop) ## Q5 ## under & over 40 Vu40 <- subset(vign3, vign3$age < 40) Vo40 <- subset(vign3, vign3$age >= 40) Cu40 <- subset(china3, china3$age < 40) Co40 <- subset(china3, china3$age >= 40) Mu40 <- subset(mexico3, mexico3$age < 40) Mo40 <- subset(mexico3, mexico3$age >= 40) ## proportion tables Vu40.pop <- prop.table(table(Vu40$self.pos)) Vo40.pop <- prop.table(table(Vo40$self.pos)) Cu40.pop <- prop.table(table(Cu40$self.pos)) Co40.pop <- prop.table(table(Co40$self.pos)) Mu40.pop <- prop.table(table(Mu40$self.pos)) Mo40.pop <- prop.table(table(Mo40$self.pos)) ## barplot(Vu40.pop); barplot(Vo40.pop) ## barplot(Cu40.pop); barplot(Co40.pop) ## barplot(Mu40.pop); barplot(Mo40.pop) ## mean mean(Vu40.pop); mean(Vo40.pop) mean(Cu40.pop); mean(Co40.pop) mean(Mu40.pop); mean(Mo40.pop)
章末問題解答(1章-1) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-1www.econ-stat-grad.com
章末問題解答(1章-2) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-2www.econ-stat-grad.com
社会科学のためのデータ分析入門 章末問題解答(3章-1) Rコード
これまでの章の解答はこちら
章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。
はじめに (Textbook Solution: Quantitative Social Science: An Introduction )
Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。
欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません(2018年冬ごろの時点では)。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.
また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。
2章-1 (Chapter2 - Section 1)
スクリプトをベタ張りしています。
## Chapter 3 Measurement ## Exercise Solution setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択 ## ----------------------------------------------------- ## The author of this script uses Japanese-Version QSS. ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## Section 1 ## Q1 reshaped <- read.csv("gayreshaped.csv") ccap <- read.csv("ccap2012.csv") head(reshaped) head(ccap) ## "complete.obs" perform a calculation for rows which have perfect matching. ## Correlation is strong b/w the two. cor(reshaped$therm1, reshaped$therm2, use = "complete.obs") ## Q2 ## Create the subset study2 <- subset(reshaped, subset = (study == 2) & (treatment == "No Contact")) study2a <- study2[, 3:6] class(study2a) ## "pairwise.complete.obs" perform a calculation for a data.frame which ## has several variables & missing values. cor(study2a, use = "pairwise.complete.obs") # correlation coefficient matrix ## The result is slightly different form the argument "pairwise.complete.obs". ## See. http://keita43a.hatenablog.com/entry/2018/04/10/034230. cor(study2a, use = "complete.obs") ## All correlation coefficients are nealy 1 (means storong correlation). ## Extra exercise, ploting the matrix using packages psych & corrplot ## install.packages("psych") library(psych) psych::pairs.panels(study2a) psych::pairs.panels(study2a, hist.col = "white", rug = F, ellipses = F, lm = T) ## install.packages("corrplot") library(corrplot) corrplot::corrplot(cor(study2a, use = "complete.obs")) cor.plot(cor(study2a, use = "complete.obs"), numbers=T) ## Q3 ## You might encounter "Error in plot.new() : figure margins too large" ## use dev.new() & dev.off() before and after the command plot plot(study2a$therm1, study2a$therm2, pch = 20, col = "blue", xlab = "Therm 1", ylab = "Therm 2") plot(study2a$therm1, study2a$therm3, pch = 20, col = "red", xlab = "Therm 1", ylab = "Therm 3") plot(study2a$therm1, study2a$therm4, pch = 20, col = "green", xlab = "Therm 1", ylab = "Therm 4") ## Plot & save 3 scatter plots in 1 screan (a bit weird though). png("aaa.png") split.screen(figs = c(1, 2)) split.screen(figs = c(2, 1), screen = 2) screen(1) plot(study2a$therm1, study2a$therm2, pch = 20, col = "blue", xlab = "Therm 1", ylab = "Therm 2") screen(3) plot(study2a$therm1, study2a$therm3, pch = 20, col = "red", xlab = "Therm 1", ylab = "Therm 3") screen(4) plot(study2a$therm1, study2a$therm4, pch = 20, col = "green", xlab = "Therm 1", ylab = "Therm 4") dev.off() ## Get back to 1 screan format. ## close.screen(all = T) ## Another way to plot. ## Plot 3 groups in 1 picture. plot(0, 0, type = "n", xlim = c(0, max(study2a$therm1)), ylim = c(0, max(study2a$therm1)), xlab = "therm1", ylab = "therm2-4") points(study2a$therm1, study2a$therm2, pch = 20, col = "red") points(study2a$therm1, study2a$therm3, pch = 20, col = "blue") points(study2a$therm1, study2a$therm4, pch = 20, col = "green") plot(0, 0, type = "n", xlim = c(0, max(study2a$therm1)), ylim = c(0, max(study2a$therm1)), xlab = "therm1", ylab = "therm2-4") points(study2a$therm1, study2a$therm2, pch = 20, col = "#30303020") points(study2a$therm1, study2a$therm3, pch = 20, col = "#ff990020") points(study2a$therm1, study2a$therm4, pch = 20, col = "#ffb6c120") ## Q4 ## Plot CCAP's gaytherm hist(ccap$gaytherm, col = "#0000ff20", las = 1, freq = FALSE, xlab = "therm", main = "Histogram of GayTherm", breaks = 20) ## Plot some lines abline(v = mean(ccap$gaytherm, na.rm = TRUE), lty = "dashed", col = "red") text(x = 67, y = 0.030, "mean(58.7)", col = "red") lines(density(ccap$gaytherm, na.rm = TRUE), col = "orange", lwd = 2) mean(ccap$gaytherm, na.rm = TRUE) ## Plot study1's gaytherm head(reshaped) study1 <- subset(reshaped, reshaped$study == 1) hist(study1$therm1, col = "#0000ff20", las = 1, freq = FALSE, xlab = "therm", main = "Histogram of Study1's Therm1", breaks = 20) hist(study2$therm1, col = "#0000ff20", las = 1, freq = FALSE, xlab = "therm", main = "Histogram of Study2's Therm1", breaks = 20) ## Q5 qqplot(study2a$therm1, study2a$therm2, pch = 20, col = "#0000ff") abline(0, 1) # 45-degree line qqplot(study2a$therm1, study2a$therm3, pch = 20, col = "#0000ff") abline(0, 1) qqplot(study2a$therm1, study2a$therm4, pch = 20, col = "#0000ff") abline(0, 1) ## It is difficult to put a interpretation. ## Q-Q plot is almost on the 45-degree line. ## This means two variables' distributions are almost the same.
章末問題解答(1章-1) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-1www.econ-stat-grad.com
章末問題解答(1章-2) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-2www.econ-stat-grad.com
社会科学のためのデータ分析入門 章末問題解答(2章-3) Rコード
これまでの章の解答はこちら
章末問題解答(1章-1)と(1章-2)のRコードの記事はこちらこちら(1章-1)(1章-2)から確認できます。
はじめに (Textbook Solution: Quantitative Social Science: An Introduction )
Rを使った統計学の日本語のテキストとして非常に定評のある社会科学のためのデータ分析入門の章末問題の解答(Rコード)です。
欠点なのかは分かりませんが、こちらのテキストには章末問題の解答がついていません。そして日本語でも英語でもwebで公開されていません(2018年冬ごろの時点では)。2018年冬に私が上巻の章末問題を解いたのですが、一度公開してみようと思ったので複数の記事に分けて投稿していこうと思います。誰かの役に立てればとも思っているのですが、私のコードにミスがあった場合に指摘していただけると嬉しいです。
I would highly appreciate if you could point out mistakes.
また同じ変数に関するプロットをする場合でも複数の方法を使ったりもしています。
2章-1 (Chapter2 - Section 1)
スクリプトをベタ張りしています。
## Chapter 2 Causality ## Exercise Solution setwd("~/qss/CAUSALITY") # ご自身のディレクトリを選択 ## ----------------------------------------------------- ## The author of this script uses Japanese-Version QSS. ## ----------------------------------------------------- ## ----------------------------------------------------- ## ----------------------------------------------------- ## Section 3 ## Q1 leader <- read.csv("leaders.csv") dim(leader) summary(leader$result) length(unique(leader$country)) # 88 countries y <- nrow(leader) # 250 countries in total z <- length(unique(leader$year)) # n of plots w/o overlaps y / z # year mean of plots in those 88 countries ## Q2 unique(leader$result) ## Create variable success in the original data frame. ## We use the logical disjunction (|) here ## because the way to die does not matter. leader$success <- NA leader$success[leader$result == "dies between a day and a week" | leader$result == "dies between a week and a month" | leader$result == "dies within a day after the attack" | leader$result == "dies, timing unknown" ] <- 1 leader$success[leader$result == "hospitalization but no permanent disability" | leader$result == "not wounded" | leader$result == "plot stopped" | leader$result == "survives but wounded severely" | leader$result == "survives, whether wounded unknown" | leader$result == "wounded lightly" ] <- 0 ## leader <- leader[, -"sucess"] # Delete wrong variable name mean(leader$success) # 21.6% ## Success of assassination attempt seems not to be done randomly. ## Thus, the assumption is not valid. ## Q3 tapply(leader$politybefore, leader$success, mean) ## The difference b/w the two is about 1.04, seems not to be trivial. tapply(leader$age, leader$success, mean) ## Aimed leader among sccuees is 3 years higher than among failure. ## Q4 ## Create a variable warbefore leader$warbefore <- NA leader$warbefore[leader$civilwarbefore == 1 | leader$interwarbefore == 1] <- 1 ## Notice that we have to use the logical cunjunction (&). leader$warbefore[leader$civilwarbefore != 1 & leader$interwarbefore != 1] <- 0 summary(leader$warbefore) war3b <- subset(leader, leader$warbefore == 1) tapply(war3b$politybefore, war3b$success, mean) tapply(war3b$age, war3b$success, mean) ## Among the countries where experience war have higher polity score. ## Among them, age of ploted leaders are higher among success, diff. is trivial though. ## Q5 ## Create a variable warbefore leader$warafter <- NA leader$warafter[leader$civilwarafter == 1 | leader$interwarafter == 1] <- 1 leader$warafter[leader$civilwarafter != 1 & leader$interwarafter != 1] <- 0 war3a <- subset(leader, subset = (warafter == 1)) tapply(leader$warafter, leader$success, mean) tapply(leader$polityafter, leader$success, mean) ## After success the rate of breaking out war is about 20%, ## while its rate after failure is 30%.
章末問題解答(1章-1) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-1www.econ-stat-grad.com
章末問題解答(1章-2) Rコード
https://www.econ-stat-grad.com/entry/statistics/qss/solution/ch1-2www.econ-stat-grad.com