This vignette describes how can time series be derived from a topic model using document’s dates and optionally document’s sentiment. Please refer to the “Basic usage” vignette for an introduction to topic model estimation.

Document dates and sentiment

The example dataset included in the package contains a docvars variable .date which contains the date of each document. To compute sentiment time series, a sentiment value per document is also required. The sentiment can be assigned using the sentopics_sentiment() helper function. sentopics_sentiment() and sentopics_date() can also recover the documents’ sentiment and date. For this example, we compute sentiment using the compute_PicaultRenault_scores() function.

library("xts")
library("data.table")
library("sentopics")
data("ECB_press_conferences_tokens")
head(docvars(ECB_press_conferences_tokens))
#        .date doc_id                                        title
# 1 1998-06-09      1 ECB Press conference: Introductory statement
# 2 1998-06-09      1 ECB Press conference: Introductory statement
# 3 1998-06-09      1 ECB Press conference: Introductory statement
# 4 1998-06-09      1 ECB Press conference: Introductory statement
# 5 1998-06-09      1 ECB Press conference: Introductory statement
# 6 1998-06-09      1 ECB Press conference: Introductory statement
#                                                               section_title
# 1 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 2 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 3 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 4 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 5 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 6 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
set.seed(123)
lda <- LDA(ECB_press_conferences_tokens, K = 9, alpha = 1, beta = 0.001)
head(sentopics_date(lda))
#       .id      .date
#    <char>     <Date>
# 1:    1_1 1998-06-09
# 2:    1_2 1998-06-09
# 3:    1_3 1998-06-09
# 4:    1_4 1998-06-09
# 5:    1_5 1998-06-09
# 6:    1_6 1998-06-09

# Compute sentiment using on the corpus
data("ECB_press_conferences")
scores <- compute_PicaultRenault_scores(ECB_press_conferences)
print(head(scores))
#            MP       EC
# 1_1  0.000000 0.000000
# 1_2  0.000000 0.000000
# 1_3  0.000000 0.000000
# 1_4  0.000000 3.323077
# 1_5 -1.694915 0.800000
# 1_6  0.000000 0.000000
sentopics_sentiment(lda) <- scores[names(ECB_press_conferences_tokens), "EC"]
head(sentopics_sentiment(lda))
#       .id .sentiment
#    <char>      <num>
# 1:    1_1   0.000000
# 2:    1_2   0.000000
# 3:    1_3   0.000000
# 4:    1_4   3.323077
# 5:    1_5   0.800000
# 6:    1_6   0.000000

For this example, the documents’ sentiment were computed using the sentometrics package. For further details on this sentiment computation, please refer to the script used in /data-raw/ on GitHub.

Now that the lda object contains dates and sentiment, we already have enough information to compute a sentiment index using sentiment_series() which aggregates document per period. By default, it returns a xts object.

xts_sent <- sentiment_series(lda, period = "month", rolling_window = 6)
plot(xts_sent)

Estimating the topic model will allow enriching this sentiment series with topical content. The model should be estimated until it returns satisfactory topics. Labeling the topics facilitates the subsequent analysis.

lda <- fit(lda, 1000)
sentopics_labels(lda) <- list(
  topic = c(
    "Economic growth & Inflation", "Banking", "Payment services",
    "European single market", "Monetary policy & Negative rate",
    "Monetary policy & Price stability", "Others", "Banking supervision",
    "Financial markets"
  )
)
plot(lda)

The estimated topic model adds a layer of topical proportions to the existing documents. This appears clearly when using melt() on the model. Leveraging on the topic and sentiment information at the document level we can compute the share of sentiment that belong to a given topic.

document_datas <- sentopics::melt(lda, include_docvars = TRUE)
head(document_datas)
#                          topic       prob      .date    .id doc_id
#                         <fctr>      <num>     <Date> <char> <char>
# 1: Economic growth & Inflation 0.07692308 1998-06-09    1_1      1
# 2: Economic growth & Inflation 0.03125000 1998-06-09    1_2      1
# 3: Economic growth & Inflation 0.06666667 1998-06-09    1_3      1
# 4: Economic growth & Inflation 0.10526316 1998-06-09    1_4      1
# 5: Economic growth & Inflation 0.05000000 1998-06-09    1_5      1
# 6: Economic growth & Inflation 0.04347826 1998-06-09    1_6      1
#                                           title
#                                          <char>
# 1: ECB Press conference: Introductory statement
# 2: ECB Press conference: Introductory statement
# 3: ECB Press conference: Introductory statement
# 4: ECB Press conference: Introductory statement
# 5: ECB Press conference: Introductory statement
# 6: ECB Press conference: Introductory statement
#                                                                section_title
#                                                                       <char>
# 1: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 2: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 3: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 4: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 5: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 6: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
#    .sentiment .sentiment_scaled
#         <num>             <num>
# 1:   0.000000        -0.2878194
# 2:   0.000000        -0.2878194
# 3:   0.000000        -0.2878194
# 4:   3.323077        13.9981400
# 5:   0.800000         3.1513930
# 6:   0.000000        -0.2878194
head(document_datas[, list(.date, topic, share_of_sentiment = prob * .sentiment), keyby = ".id"])
# Key: <.id>
#       .id      .date                             topic share_of_sentiment
#    <char>     <Date>                            <fctr>              <num>
# 1:  100_1 2006-06-08       Economic growth & Inflation                  0
# 2:  100_1 2006-06-08                           Banking                  0
# 3:  100_1 2006-06-08                  Payment services                  0
# 4:  100_1 2006-06-08            European single market                  0
# 5:  100_1 2006-06-08   Monetary policy & Negative rate                  0
# 6:  100_1 2006-06-08 Monetary policy & Price stability                  0

Using this share of sentiment and the documents’ date, one may compute two additional outputs: a breakdown of the sentiment time series and a time series of the sentiment expressed by each topic. The difference between the two outputs rely on the aggregation between documents. The breakdown averages documents’ share of sentiment with an equal weighting, whereas computing the sentiment expressed by a topic requires weighting documents by their attention to this given topic. These two aggregations are implemented through the sentiment_breakdown() and sentiment_topics() functions.

head(na.omit(sentiment_breakdown(lda, period = "month", rolling_window = 6)))
#              sentiment Economic growth & Inflation      Banking
# 1998-11-01  0.09010008                 -0.01904483  0.035112177
# 1998-12-01 -0.45559160                 -0.05440040 -0.014552201
# 1999-01-01 -0.42761571                 -0.04113759  0.002129632
# 1999-02-01 -0.41344980                 -0.03354954 -0.003704515
# 1999-03-01 -0.61087669                 -0.06876449 -0.031204869
# 1999-04-01 -0.68336368                 -0.09674215 -0.042874486
#            Payment services European single market
# 1998-11-01      -0.01152206            0.007737369
# 1998-12-01      -0.03587492           -0.045447569
# 1999-01-01      -0.02966671           -0.058568426
# 1999-02-01      -0.03432780           -0.064975734
# 1999-03-01      -0.04840891           -0.080776954
# 1999-04-01      -0.05984233           -0.088258889
#            Monetary policy & Negative rate Monetary policy & Price stability
# 1998-11-01                      0.03417696                      -0.009402267
# 1998-12-01                     -0.19745278                      -0.058856521
# 1999-01-01                     -0.18678102                      -0.056751272
# 1999-02-01                     -0.15999806                      -0.054176632
# 1999-03-01                     -0.13857232                      -0.064260554
# 1999-04-01                     -0.12059812                      -0.073725054
#                 Others Banking supervision Financial markets
# 1998-11-01 -0.02328005        -0.002840683        0.07916346
# 1998-12-01 -0.06161078        -0.035188863        0.04779243
# 1999-01-01 -0.04412574        -0.029959837        0.01724525
# 1999-02-01 -0.04344085        -0.034308726        0.01503206
# 1999-03-01 -0.04549492        -0.062533187       -0.07086049
# 1999-04-01 -0.03478823        -0.067518785       -0.09901563
head(na.omit(sentiment_topics(lda, period = "month", rolling_window = 6)))
#            Economic growth & Inflation     Banking Payment services
# 1998-11-01                  0.01447047  0.55534154      0.009189361
# 1998-12-01                 -0.56028053  0.03286343     -0.345959422
# 1999-01-01                 -0.40974223  0.18691915     -0.283770567
# 1999-02-01                 -0.33352783  0.11930929     -0.370911214
# 1999-03-01                 -0.69217666 -0.20873019     -0.584418031
# 1999-04-01                 -0.83630254 -0.32310287     -0.701357848
#            European single market Monetary policy & Negative rate
# 1998-11-01              0.1256498                     -0.08130855
# 1998-12-01             -0.6418995                     -0.65763798
# 1999-01-01             -0.7690328                     -0.62406240
# 1999-02-01             -0.8116283                     -0.54338464
# 1999-03-01             -0.9940525                     -0.51557313
# 1999-04-01             -1.0203800                     -0.44942975
#            Monetary policy & Price stability     Others Banking supervision
# 1998-11-01                       0.005121783 -0.2596372          0.07700759
# 1998-12-01                      -0.605566300 -0.5873370         -0.51761060
# 1999-01-01                      -0.583579101 -0.4442754         -0.45029075
# 1999-02-01                      -0.549137776 -0.4530577         -0.46669566
# 1999-03-01                      -0.655754068 -0.4757660         -0.72875939
# 1999-04-01                      -0.745984527 -0.4240332         -0.75708173
#            Financial markets
# 1998-11-01        0.78480930
# 1998-12-01        0.31233733
# 1999-01-01       -0.04814063
# 1999-02-01       -0.03424972
# 1999-03-01       -0.70150195
# 1999-04-01       -0.88836737

Furthermore, these functions have embedded plot options, that are directly accessible using the plot_ prefix.

plot_sentiment_breakdown(lda, period = "month", rolling_window = 6)

plot_sentiment_topics(lda, period = "month", rolling_window = 6)

- Document dates and sentiment

Topical time series

Document dates and sentiment