This vignette describes how can time series be derived from a topic model using document’s dates and optionally document’s sentiment. Please refer to the “Basic usage” vignette for an introduction to topic model estimation.
The example dataset included in the package contains a
docvars variable .date which contains the date of
each document. To compute sentiment time series, a sentiment value per
document is also required. The sentiment can be assigned using the
sentopics_sentiment() helper function.
sentopics_sentiment() and sentopics_date() can
also recover the documents’ sentiment and date. For this example, we
compute sentiment using the compute_PicaultRenault_scores()
function.
library("xts")
library("data.table")
library("sentopics")
data("ECB_press_conferences_tokens")
head(docvars(ECB_press_conferences_tokens))
# .date doc_id title
# 1 1998-06-09 1 ECB Press conference: Introductory statement
# 2 1998-06-09 1 ECB Press conference: Introductory statement
# 3 1998-06-09 1 ECB Press conference: Introductory statement
# 4 1998-06-09 1 ECB Press conference: Introductory statement
# 5 1998-06-09 1 ECB Press conference: Introductory statement
# 6 1998-06-09 1 ECB Press conference: Introductory statement
# section_title
# 1 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 2 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 3 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 4 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 5 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 6 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
set.seed(123)
lda <- LDA(ECB_press_conferences_tokens, K = 9, alpha = 1, beta = 0.001)
head(sentopics_date(lda))
# .id .date
# <char> <Date>
# 1: 1_1 1998-06-09
# 2: 1_2 1998-06-09
# 3: 1_3 1998-06-09
# 4: 1_4 1998-06-09
# 5: 1_5 1998-06-09
# 6: 1_6 1998-06-09
# Compute sentiment using on the corpus
data("ECB_press_conferences")
scores <- compute_PicaultRenault_scores(ECB_press_conferences)
print(head(scores))
# MP EC
# 1_1 0.000000 0.000000
# 1_2 0.000000 0.000000
# 1_3 0.000000 0.000000
# 1_4 0.000000 3.323077
# 1_5 -1.694915 0.800000
# 1_6 0.000000 0.000000
sentopics_sentiment(lda) <- scores[names(ECB_press_conferences_tokens), "EC"]
head(sentopics_sentiment(lda))
# .id .sentiment
# <char> <num>
# 1: 1_1 0.000000
# 2: 1_2 0.000000
# 3: 1_3 0.000000
# 4: 1_4 3.323077
# 5: 1_5 0.800000
# 6: 1_6 0.000000For this example, the documents’ sentiment were computed using the
sentometrics package. For further details on this sentiment
computation, please refer to the script used in /data-raw/
on GitHub.
Now that the lda object contains dates and sentiment, we
already have enough information to compute a sentiment index using
sentiment_series() which aggregates document per period. By
default, it returns a xts object.
Estimating the topic model will allow enriching this sentiment series with topical content. The model should be estimated until it returns satisfactory topics. Labeling the topics facilitates the subsequent analysis.
lda <- fit(lda, 1000)
sentopics_labels(lda) <- list(
topic = c(
"Economic growth & Inflation", "Banking", "Payment services",
"European single market", "Monetary policy & Negative rate",
"Monetary policy & Price stability", "Others", "Banking supervision",
"Financial markets"
)
)
plot(lda)The estimated topic model adds a layer of topical proportions to the
existing documents. This appears clearly when using melt()
on the model. Leveraging on the topic and sentiment information at the
document level we can compute the share of sentiment that belong to a
given topic.
document_datas <- sentopics::melt(lda, include_docvars = TRUE)
head(document_datas)
# topic prob .date .id doc_id
# <fctr> <num> <Date> <char> <char>
# 1: Economic growth & Inflation 0.07692308 1998-06-09 1_1 1
# 2: Economic growth & Inflation 0.03125000 1998-06-09 1_2 1
# 3: Economic growth & Inflation 0.06666667 1998-06-09 1_3 1
# 4: Economic growth & Inflation 0.10526316 1998-06-09 1_4 1
# 5: Economic growth & Inflation 0.05000000 1998-06-09 1_5 1
# 6: Economic growth & Inflation 0.04347826 1998-06-09 1_6 1
# title
# <char>
# 1: ECB Press conference: Introductory statement
# 2: ECB Press conference: Introductory statement
# 3: ECB Press conference: Introductory statement
# 4: ECB Press conference: Introductory statement
# 5: ECB Press conference: Introductory statement
# 6: ECB Press conference: Introductory statement
# section_title
# <char>
# 1: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 2: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 3: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 4: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 5: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 6: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# .sentiment .sentiment_scaled
# <num> <num>
# 1: 0.000000 -0.2878194
# 2: 0.000000 -0.2878194
# 3: 0.000000 -0.2878194
# 4: 3.323077 13.9981400
# 5: 0.800000 3.1513930
# 6: 0.000000 -0.2878194
head(document_datas[, list(.date, topic, share_of_sentiment = prob * .sentiment), keyby = ".id"])
# Key: <.id>
# .id .date topic share_of_sentiment
# <char> <Date> <fctr> <num>
# 1: 100_1 2006-06-08 Economic growth & Inflation 0
# 2: 100_1 2006-06-08 Banking 0
# 3: 100_1 2006-06-08 Payment services 0
# 4: 100_1 2006-06-08 European single market 0
# 5: 100_1 2006-06-08 Monetary policy & Negative rate 0
# 6: 100_1 2006-06-08 Monetary policy & Price stability 0Using this share of sentiment and the documents’ date, one may
compute two additional outputs: a breakdown of the sentiment time series
and a time series of the sentiment expressed by each topic. The
difference between the two outputs rely on the aggregation between
documents. The breakdown averages documents’ share of sentiment with an
equal weighting, whereas computing the sentiment expressed by a topic
requires weighting documents by their attention to this given topic.
These two aggregations are implemented through the
sentiment_breakdown() and sentiment_topics()
functions.
head(na.omit(sentiment_breakdown(lda, period = "month", rolling_window = 6)))
# sentiment Economic growth & Inflation Banking
# 1998-11-01 0.09010008 -0.01904483 0.035112177
# 1998-12-01 -0.45559160 -0.05440040 -0.014552201
# 1999-01-01 -0.42761571 -0.04113759 0.002129632
# 1999-02-01 -0.41344980 -0.03354954 -0.003704515
# 1999-03-01 -0.61087669 -0.06876449 -0.031204869
# 1999-04-01 -0.68336368 -0.09674215 -0.042874486
# Payment services European single market
# 1998-11-01 -0.01152206 0.007737369
# 1998-12-01 -0.03587492 -0.045447569
# 1999-01-01 -0.02966671 -0.058568426
# 1999-02-01 -0.03432780 -0.064975734
# 1999-03-01 -0.04840891 -0.080776954
# 1999-04-01 -0.05984233 -0.088258889
# Monetary policy & Negative rate Monetary policy & Price stability
# 1998-11-01 0.03417696 -0.009402267
# 1998-12-01 -0.19745278 -0.058856521
# 1999-01-01 -0.18678102 -0.056751272
# 1999-02-01 -0.15999806 -0.054176632
# 1999-03-01 -0.13857232 -0.064260554
# 1999-04-01 -0.12059812 -0.073725054
# Others Banking supervision Financial markets
# 1998-11-01 -0.02328005 -0.002840683 0.07916346
# 1998-12-01 -0.06161078 -0.035188863 0.04779243
# 1999-01-01 -0.04412574 -0.029959837 0.01724525
# 1999-02-01 -0.04344085 -0.034308726 0.01503206
# 1999-03-01 -0.04549492 -0.062533187 -0.07086049
# 1999-04-01 -0.03478823 -0.067518785 -0.09901563
head(na.omit(sentiment_topics(lda, period = "month", rolling_window = 6)))
# Economic growth & Inflation Banking Payment services
# 1998-11-01 0.01447047 0.55534154 0.009189361
# 1998-12-01 -0.56028053 0.03286343 -0.345959422
# 1999-01-01 -0.40974223 0.18691915 -0.283770567
# 1999-02-01 -0.33352783 0.11930929 -0.370911214
# 1999-03-01 -0.69217666 -0.20873019 -0.584418031
# 1999-04-01 -0.83630254 -0.32310287 -0.701357848
# European single market Monetary policy & Negative rate
# 1998-11-01 0.1256498 -0.08130855
# 1998-12-01 -0.6418995 -0.65763798
# 1999-01-01 -0.7690328 -0.62406240
# 1999-02-01 -0.8116283 -0.54338464
# 1999-03-01 -0.9940525 -0.51557313
# 1999-04-01 -1.0203800 -0.44942975
# Monetary policy & Price stability Others Banking supervision
# 1998-11-01 0.005121783 -0.2596372 0.07700759
# 1998-12-01 -0.605566300 -0.5873370 -0.51761060
# 1999-01-01 -0.583579101 -0.4442754 -0.45029075
# 1999-02-01 -0.549137776 -0.4530577 -0.46669566
# 1999-03-01 -0.655754068 -0.4757660 -0.72875939
# 1999-04-01 -0.745984527 -0.4240332 -0.75708173
# Financial markets
# 1998-11-01 0.78480930
# 1998-12-01 0.31233733
# 1999-01-01 -0.04814063
# 1999-02-01 -0.03424972
# 1999-03-01 -0.70150195
# 1999-04-01 -0.88836737Furthermore, these functions have embedded plot options, that are
directly accessible using the plot_ prefix.