<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">WEB</journal-id>
<journal-title-group><journal-title>Web Intelligence</journal-title></journal-title-group>
<issn pub-type="epub">2405-6464</issn><issn pub-type="ppub">2405-6456</issn><issn-l>2405-6456</issn-l>
<publisher>
<publisher-name>IOS Press</publisher-name><publisher-loc>Nieuwe Hemweg 6B, 1013 BG Amsterdam, The Netherlands</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">WEB345</article-id>
<article-id pub-id-type="doi">10.3233/WEB-160345</article-id>
<article-categories><subj-group subj-group-type="heading">
<subject>Research Article</subject></subj-group></article-categories>
<title-group>
<article-title>A topic-based sentiment analysis model to predict stock market price movement using Weibo mood</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Chen</surname><given-names>Wenhao</given-names></name><xref ref-type="aff" rid="affa">a</xref><xref ref-type="corresp" rid="cor1">*</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Cai</surname><given-names>Yi</given-names></name><xref ref-type="aff" rid="affb">b</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Lai</surname><given-names>Kinkeung</given-names></name><xref ref-type="aff" rid="affc">c</xref><xref ref-type="aff" rid="affd">d</xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Xie</surname><given-names>Haoran</given-names></name><xref ref-type="aff" rid="affe">e</xref>
</contrib>
<aff id="affa"><label>a</label>Department of Management Science, <institution>City University of Hong Kong</institution>, <country>Hong Kong</country>. E-mail: <email>wenhachen2-c@my.cityu.edu.hk</email></aff>
<aff id="affb"><label>b</label>School of Software Engineering, <institution>South China University of Technology</institution>, Guangzhou, <country>China</country>. E-mail: <email>ycai@scut.edu.cn</email></aff>
<aff id="affc"><label>c</label>Department of Industrial and Manufacturing Systems Engineering, <institution>Hong Kong University</institution>, <country>Hong Kong</country>. E-mail: <email>mskklai@cityu.edu.hk</email></aff>
<aff id="affd"><label>d</label>International Business School, <institution>Shaanxi Normal University</institution>, Xian, <country>China</country></aff>
<aff id="affe"><label>e</label>Department of Mathematics and Information Technology, <institution>The Education University of Hong Kong</institution>, <country>Hong Kong</country>. E-mail: <email>hrxie2@gmail.com</email></aff>
</contrib-group>
<contrib-group content-type="guest-editors">
<contrib contrib-type="guest-editor">
<name><surname>Tao</surname><given-names>Xiaohui</given-names></name>
</contrib>
<contrib contrib-type="guest-editor">
<name><surname>Huang</surname><given-names>Wei</given-names></name>
</contrib>
<contrib contrib-type="guest-editor">
<name><surname>Mu</surname><given-names>Xiangming</given-names></name>
</contrib>
<contrib contrib-type="guest-editor">
<name><surname>Xie</surname><given-names>Haoran</given-names></name>
</contrib>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>*</label>Corresponding author. E-mail: <email>wenhachen2-c@my.cityu.edu.hk</email>.</corresp>
</author-notes>
<pub-date pub-type="ppub"><year>2016</year></pub-date><volume>14</volume><issue>4</issue><issue-title>Knowledge management of web social media</issue-title><fpage>287</fpage><lpage>300</lpage>
<permissions><copyright-statement>IOS Press and the authors. All rights reserved</copyright-statement><copyright-year>2016</copyright-year></permissions>
<abstract>
<p>Over the past several years, as the development of Internet, social media websites such as Twitter and Weibo have received much attention due to their enormous users. A lot of research has been done on sentiment analysis and opinion mining in these websites. However the number of research on using the data in the social media websites to predict the stock market price movement is limited. Behavioral economics and behavioral finance believe that public mood is correlated with economic indicators and financial decisions are significantly driven by emotions. This paper first presents a Chinese emotion mining approach and discusses whether the public emotions or opinions in the Chinese social media websites could be used to predict the stock market price in China. The experimental results demonstrate that the emotions automatically extracted from the large scale Weibo posts represent the real public opinions about some special topics of the stock market in China. Some public mood states extracted such as the “Happiness” and “Disgust” states are highly correlated with the change of stock price according to the Granger causality analysis. Finally, a nonlinear autoregressive model with exogenous sentiment inputs is proposed to predict the stock price movement.</p>
</abstract>
<kwd-group>
<label>Keywords</label>
<kwd>Sentiment analysis</kwd>
<kwd>public mood</kwd>
<kwd>stock price forecasting</kwd>
<kwd>artificial neural network</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="x1-10001">
<label>1.</label>
<title>Introduction</title>
<p>For many years, a lot of research has been conducted on the stock price prediction [<xref ref-type="bibr" rid="ref010">10</xref>]. Early research focuses on efficient market hypothesis and random walk theory [<xref ref-type="bibr" rid="ref008">8</xref>] which indicates that the stock prices are driven by new information such as news rather than previous prices. Nowadays many different kinds of model have be developed to forecast the stock price [<xref ref-type="bibr" rid="ref017">17</xref>,<xref ref-type="bibr" rid="ref029">29</xref>,<xref ref-type="bibr" rid="ref036">36</xref>]. And some recent research indicates that although news are unpredictable however according to the behavioral finance, the early indicators could be detected through different approaches. Many researchers suggest to use online social media (blogs, micro-blogs, facebook etc.) to obtain these indicators and improve the forecasting performance [<xref ref-type="bibr" rid="ref003">3</xref>,<xref ref-type="bibr" rid="ref024">24</xref>].</p>
<p>The number of social media website users around the world has a high rapid growth over the past 10 years. The exponentially increased user-generated content in social media websites such as Facebook and Twitter has provided a potential data source for opinion mining as users share their opinions in these websites by text posts, pictures and even videos. In China, Weibo is one of the largest microblog social media websites and the platform is similar as Twitter. The distinctive nature of social media websites makes it a valuable source for mining public opinions or emotions. Firstly, a lot of users post their opinions about different topics in the website which represent their emotions in some extent. Secondly, these posts are highly time sensitive. Active users in these websites post their opinions every day, even every hour. Thirdly, some social media websites such as Weibo have an approval mechanism for registration which makes sure each user in these systems is a real person. Fourthly, some organizations and companies use social media websites such as Facebook, Twitter and Weibo as their formal channel to publish their opinions to the public. As a result the data collected from these social media websites can help researchers to understand the public mood states or emotions.</p>
<p>Recently significant progress has been achieved in extracting states of public mood directly from social media websites such Twitter and use them to predict the stock price movement [<xref ref-type="bibr" rid="ref014">14</xref>]. However, the number of research on Chinese social media websites is limited and the number of research on how to use the textual sentiment analysis result of social media websites towards Chinese stock market is even lower. In recent years, China has already been the second largest economies and the stock market in China has a great impact to the global stock markets. The intraday trading volume on Shanghai Stock Exchange (SSE) has been more than 1 thousand billion RMB on 20th April 2015 which is more than any stock exchange intraday trading volume in the history. Instead of Shanghai Stock Exchange, there are also Shenzhen Stock Exchange and Hong Kong Stock Exchange in China. How the stock market price in China be affected by the public mood change is important for economists, traders and socialist.</p>
<p>This paper is interested in discussing whether the public mood states extracted from Chinese social media websites such as Weibo can represent the real opinion of publics towards the stock market and how the public mood states impact the stock market. Although researches on behavioral economics and finance have already demonstrate the correlation between emotions and the stock price, this paper will discuss whether Weibo is a valid source to monitor the public mood states about the stock market and demonstrate the causality between the Weibo public mood states and the stock price. This paper first presents a topic-based approach to extract public mood states from Weibo. After that a Granger causality analysis is used to investigate the hypothesis that the extracted public mood states are predictive of the stock price movement in China. The experimental results indicate that some public mood states such as the “Happiness” and “Disgust” states are highly correlated with the movement of stock price. Finally, the nonlinear correlation between the stock price movement and the public mood in Weibo is discussed. A topic-based nonlinear autoregressive model with exogenous Weibo mood input is then proposed to predict the future stock price movement. The prediction accuracy of the price movement direction is competitive and high. The model could be used as a reference for the investment decision making process.</p>
</sec>
<sec id="x1-20002">
<label>2.</label>
<title>Background</title>
<p>In recent years, social media websites, such as Facebook, Twitter and Weibo become more and more popular. The user number is enormous and there are different user communities with diversified natures in these websites. A lot of data is published by the online users including comments and news in these websites. The data, to some extent, represents the public emotions or opinions. An increasing number of research has attempted to integrate sentiment analysis result of the online data into different models.</p>
<sec id="x1-30002.1">
<label>2.1.</label>
<title>Public mood in social media websites</title>
<p>Social media is known as a computer-mediated tool which allows people to create, share or exchange information. It has many different forms including blogs, microblogs, photo sharing, video sharing, forum, tagging system and social network. The increasing popularity of social media websites and Web2.0 has led to exponential growth of user-generated content, especially text content on Internet. Abbasi et al. [<xref ref-type="bibr" rid="ref001">1</xref>] have demonstrated that the information retrieval and automated analysis technique are useful for understanding the online content such as forum posts and social interactions in online communities. Through analyzing the blog content, Liang et al. [<xref ref-type="bibr" rid="ref022">22</xref>] indicated that a company can get the first hand knowledge or feedback from its clients. And it can also help to understand how the online customer networks appear and evolve [<xref ref-type="bibr" rid="ref005">5</xref>]. This kind of information extracted from blogs enable organizations to make better decision on critical business area which is important for business intelligence [<xref ref-type="bibr" rid="ref027">27</xref>]. In terms of integration of social media sentiment to applications in different industries, a lot of research is conducted to use the sentiment from twitter to forecast spikes in book sales [<xref ref-type="bibr" rid="ref014">14</xref>] and the revenues of box-office for movies in North America [<xref ref-type="bibr" rid="ref024">24</xref>]. Another research area for using text sentiment analysis result is the recommendation system. Many experiments have been done to integrate sentiment analysis results of online text into recommendation systems [<xref ref-type="bibr" rid="ref032">32</xref>].</p>
</sec>
<sec id="x1-40002.2">
<label>2.2.</label>
<title>Behavioral economics and finance</title>
<p>Behavioral economics and finance study the impact of psychological or emotion factors on the related decision making area and the effect on stock price, returns and the risk of the market. Psychological research has already proved that emotions, in addition to information has a significant effect in human decision-making [<xref ref-type="bibr" rid="ref007">7</xref>]. Behavioral finance has further demonstrated that investment decision of investors is more likely driven by their emotions [<xref ref-type="bibr" rid="ref026">26</xref>]. And the momentum generated from the public mood with other factors such as economic factors determine prices of the stock market.</p>
<p>Early research on stock market prediction focus on building models with random walk theory and Efficient Market Hypothesis [<xref ref-type="bibr" rid="ref009">9</xref>]. However many researches show that the stock market prices do not follow a random walk and some researchers suggest that some indicators of the price can be extracted from the online social media. Schumaker &amp; Chen [<xref ref-type="bibr" rid="ref031">31</xref>] applied machine learning methods to financial news articles and find that the sentiment in news articles has an immediately impact on the market price. The prediction model has the best performance by adding the news article factors. Based on the community sentiment retrieved from the posts on the Yahoo Finance Forum through an expert classification system, Liu et al. [<xref ref-type="bibr" rid="ref023">23</xref>] has indicated the correlation between the sentiment and the stock price. Gibert [<xref ref-type="bibr" rid="ref013">13</xref>] using the LiveJournal as a source, has extracted the anxiety, worry and fear mood from the posts. He found out that the increase on expression of anxiety have indicated that the S&amp;P 500 Index will move downward soon. Bollen et al. [<xref ref-type="bibr" rid="ref003">3</xref>] has investigated the correlation between the collective mood states from large-scale twitter feeds and the value of the Dow Jones Industrial Average (DJIA) over time. Using twitter posts as well, Zhang et al. [<xref ref-type="bibr" rid="ref035">35</xref>] found out that the emotions can be used to predict NASDAQ and S&amp;P500 index as well. Li et al. [<xref ref-type="bibr" rid="ref020">20</xref>,<xref ref-type="bibr" rid="ref021">21</xref>] discussed the problem in using News or summarization of News to predict stock price in Hong Kong. Rao et al. [<xref ref-type="bibr" rid="ref030">30</xref>] propose a topic-level maximum entropy (TME) model for social emotion classification over short text.</p>
</sec>
<sec id="x1-50002.3">
<label>2.3.</label>
<title>Chinese social media analysis</title>
<p>As Chinese is quite different with English and it is more complicated in terms of recognition, segmentation and analysis, the number of research on Chinese social media mining is limited and the number of research on using the textual sentiment analysis result to predict the stock market in China is even lower. In previous research, Gao et al. [<xref ref-type="bibr" rid="ref012">12</xref>] indicated the difference between Sina Weibo and Twitter. Yang et al. [<xref ref-type="bibr" rid="ref034">34</xref>] have proposed a classifier to automatically detect the rumors from a mixed set of information from the posts in Sina Weibo. For sentiment analysis, Rui et al. found out that the correlation of anger among users is significantly higher than that of joy in Sina Weibo [<xref ref-type="bibr" rid="ref011">11</xref>]. There are also researches about multi-lingual sentiment analysis, K. Ahmad et al. [<xref ref-type="bibr" rid="ref002">2</xref>] developed a local grammar approach which works equally on English, Chinese (Sino-Asiatic) and Arabic (Semitic).</p>
<p>As the stock market in China including the Shanghai Stock Exchange, Shenzhen Stock Exchange and Hong Kong Stock Exchange has already been one of the largest stock market around the world. The turnover or trading volume in these stock exchanges is enormous and the volatility of the stock price is high. As a result, the risk of trading in these exchanges is quite high especially for the Shanghai Stock Exchange and Shenzhen Stock Exchange as they are still be considered as the emerging markets in which the regulation and rule is not mature and the stock price is more easily controlled by inside information, special events and public emotions. Accordingly, it is an important research topic that to find out how the price movement is affected by different behavioral factors especially the public emotions. This is the area that this paper want to discuss and the paper demonstrates that the public mood derived from Weibo is correlated with the stock price movement in China. In addition, at the end of this paper, a Chinese sentiment analysis approach to extract public mood states from Weibo and a nonlinear autoregressive exogenous model using Weibo mood to predict the stock price movement are proposed.</p>
</sec>
</sec>
<sec id="x1-60003">
<label>3.</label>
<title>Data and system framework overview</title>
<p>Some research has already been conducted on using the overall public mood from Twitter to predict the stock market price [<xref ref-type="bibr" rid="ref003">3</xref>]. However a lot of unrelated posts are in these social media websites. If researchers want to discuss the emotion impact on a stock, they need to find out the text which is highly related to the stock. For example, even most users have bad comments on a movie in Twitter that should not have a big impact on the stock price of an oil company. As a result, to filter the unnecessary noise, this paper uses a topic-based approach to analyze the public mood, and discuss how users’ comments or emotions about these topics affect the stock market. Because of the characteristic of social media websites, discussions and news in these websites such as Twitter and Weibo are topic based. For example, a lot of posts are published to discuss the impact of “Shenzhen-Hong Kong Stock Connect (SZ-HK connect)” in Weibo. In Twitter, users use hashtags to identify different topics for discussion. Extracting the public mood states or opinions based on topics has two advantages. First, it can filter out unrelated information and only focus on opinions about the specific topic. Second, the public mood states about some special topics are easily be reused to discuss the relationship between different topics and data. One of the data source discussed in this paper is the text content from a Chinese social media website called Weibo. It was launched in August 2009. As of mid 2014, there are 167 million monthly active users, more than 25 million posts each day.</p>
<p>The database used in this paper is a collection of public Weibo posts that was recorded from Jan 2015 to Feb 2016. For each post the database provides an identifier, the datetime of the submission and the text content. All these posts are crawled for Weibo.com. After that the posts are classified by date and stop-words and punctuation are removed for each post. As this paper is discussing the public mood related to Chinese stock market, 3 topics for discussion are selected which are: “Shanghai-Hong Kong Stock Connect (SH-HK connect) and Shenzhen-Hong Kong Stock Connect (SZ-HK connect)”, “Interest rate and reserve rate cut in China”, “Greece Government-Debt Crisis”. The reason why these 3 topics are selected is that from 1st Jan 2015 to 31th Dec 2015, they are mostly discussed topics related to the stock market in the website. SH-HK connect has been launched on November 2014 and SZ-HK connect is planned to launched in 2016. It allows the individual and institutional traders in Shanghai or Shenzhen to trade the stock market in Hong Kong through the Connect. And the traders in Hong Kong also can use it to trade the stock market in Shanghai or Shenzhen. It provides different choices and an arbitrage chance for the traders in Hong Kong and Mainland China. It is considered as a great milestone in the development of China stock market and people believe that it will have a positive impact on China stock market. Greece Government-Debt Crisis is the debt crisis that happens in European Union (EU) in 2015, people worry that if Greece cannot pay the debt on time, it will cause the financial crisis in EU and the international crisis will be triggered again. Interest rate and reserve rate cut means the central bank in China plan to cut the bank lending interest rate and the amount of cash banks must keep in reserve. It will increase the flow of money in the economy and reduce the cost of financing to support the investment and developments of the real economy. As it will help to boost the growth of economy, it has a positive impact on the stock market. The second source of data is the daily end price of stocks in Hong Kong and Shanghai stock exchanges which could be downloaded from public available websites such as Yahoo!Finance and <uri>http://finance.sina.com.cn/</uri>.</p>
<p>The system framework can be summarized as follows. In the first phase, it will filter the posts and only extract the posts containing the keywords related to the 3 topics. Posts are grouped under the same topic. In the second phase, it will use two tools to measure the sentiment score for different emotions in the posts. The first tool, Jieba, will help to analyze the Chinese text and segment the text into meaningful Chinese words. It is an useful tool to segment user-generated content in forums [<xref ref-type="bibr" rid="ref033">33</xref>]. The second tool, Chinese Emotion Word Ontology (CEWO), is a Chinese sentiment word dictionary which classifies the Chinese emotion words into 7 different categories [<xref ref-type="bibr" rid="ref006">6</xref>]. A novel method to generate the sentiment score or mood states for each post based on the CEWO is applied to the dataset. After that it combines the sentiment score for all posts on the same day under the same topic. As a result, the daily sentiment score for different emotions are generated for each topic. The sentiment score across the observation time can be regarded as time series. In the third phase, it investigates the hypothesis that public mood measured by our mechanism is predictive of future stock price using Granger causality analysis. As the public mood generated is related with different stocks, the impact on different kinds of stocks are discussed. At the end, a nonlinear autoregressive model with exogenous Weibo mood inputs is proposed to predict the price and moving directions of a stock.</p>
</sec>
<sec id="x1-70004">
<label>4.</label>
<title>Data processing: A new Chinese sentiment analysis method</title>
<p>As mentioned above, the raw text data is restored in database first. And then the Chinese segmentation tool Jieba is used to segment the text into different words. The Jieba tool use the dynamic programming to find out the most probable combination based on the word frequency. The segmentation result will be restored in the database as well. The next step is to generate the sentiment score or mood states from each post. For this part, the model will use the Chinese Emotion Word Ontology constructed by the IR lab in Dalian University of Technology. This ontology is constructed based on Ekman theory of 6 basic emotions (Happiness, Sadness, Surprise, Fear, Disgust and Anger). And one more emotion “good” is added to make it more comfortable for Chinese language analysis [<xref ref-type="bibr" rid="ref006">6</xref>]. Figure <xref rid="x1-70011">1</xref> shows the mapping from English to Chinese for the 7 categories of emotion words. These categories are also known as different dimensions of public mood. Each emotion word in the Ontology has its own category tag such as “Happiness”, intensity value and polarity value.</p>
<fig id="x1-70011">
<label>Fig. 1.</label>
<caption>
<p>The 7 Categories in the Chinese Emotion Word Ontology.</p>
</caption>
<graphic xlink:href="345f01.jpg"/>
</fig>
<p>The proposed method to generate the sentiment score or public mood states including 4 steps. First, Jieba is used to segment the post and translate the post to a list of meaningful words including different lexical class such noun, verb, adv and adj. Secondly, it will process the word list and search each word of the list in the Chinese Emotion Word Ontology based on its lexical class. If a word <italic>i</italic> is found in the Ontology, the intensity value of the word in the Ontology will be recorded as <inline-formula><mml:math id="math001">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula>. Thirdly, the sentiment score of the post will be calculated as follows: 
<disp-formula>
<mml:math display="block" id="math002">
<mml:mtable displaystyle="true"><mml:mlabeledtr>
<mml:mtd>
<mml:mtext>(1)</mml:mtext>
</mml:mtd>
<mml:mtd>
<mml:mi mathvariant="italic">y</mml:mi>
<mml:mo>=</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mo movablelimits="false">max</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo movablelimits="false">…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>7</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mo mathvariant="normal" fence="true" maxsize="2.03em" minsize="2.03em">(</mml:mo>
<mml:munder>
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo stretchy="false">∈</mml:mo>
<mml:mi mathvariant="italic">I</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">(</mml:mo>
<mml:mi mathvariant="italic">j</mml:mi>
<mml:mo mathvariant="normal" fence="true" stretchy="false">)</mml:mo>
</mml:mrow>
</mml:munder>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">s</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal" fence="true" maxsize="2.03em" minsize="2.03em">)</mml:mo>
</mml:mtd>
</mml:mlabeledtr></mml:mtable></mml:math>
</disp-formula>
</p>
<p>Where <italic>I</italic>(<italic>j</italic>) is a set including all words found in <italic>j</italic>th emotion category of the Ontology for a post, <italic>I</italic>(<italic>1</italic>) means the words found in the first emotion category of the Ontology which is the category of “Happiness”. The algorithm assumes that each post only presents one kind of emotion which is the one that has the overall highest intensity value or sentiment score. As a result each post in the database will be attached an tag which identifies the category it belongs to such as “Happiness”. In the fourth step, the frequency of the posts belong to the same emotion category is calculated for each day. The value will be represented as <inline-formula><mml:math id="math003">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> where <italic>i</italic> belongs to one of the 7 emotion categories and <italic>t</italic> means date <italic>t</italic> in the observation period. Then the time series for emotion <italic>i</italic> can be represented as <inline-formula><mml:math id="math004">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula>.</p>
<p>After processing the raw text data of Weibo posts, for each topic discussed, the model generates 7 time series. Each time series represent the movement of one dimension of the public mood or emotion such as “Happiness”. Another data source is the time series of the stock price. The time series of the stock close price is downloaded from Yahoo!Finance and other public websites. As the emotion time series are associated with different stocks. the close prices from 1st Jan 2015 to 31th Mar 2016 for 4 different stocks including the Hang Seng Index (HSI), Shanghai Stock Exchange Composite Index (SSECI), Hong Kong Exchanges and Clearing Limited corporation stock (stock code is 388) and Shanghai Connect Index (stock code is 000159) are downloaded.</p>
</sec>
<sec id="x1-80005">
<label>5.</label>
<title>Public mood validation</title>
<p>To validate the ability of the proposed method to capture various aspects of public moods, the algorithm is applied to the Weibo posts published from 1st Jan 2015 to 31th Mar 2016. The distribution of the posts indicates that the main part of the positive topic posts such as “interest rate cute” are the posts with the positive emotion such as “Happiness” and “Good” and vice versa. The distribution of the posts is shown in Fig. <xref rid="x1-80012">2</xref>.</p>
<fig id="x1-80012">
<label>Fig. 2.</label>
<caption>
<p>The percentage distribution of the posts with different emotions under 3 topics.</p>
</caption>
<graphic xlink:href="345f02.jpg"/>
</fig>
<fig id="x1-80023">
<label>Fig. 3.</label>
<caption>
<p>Public mood states between Mar to June 2015 shows public responses to the announcement of interest rate cut.</p>
</caption>
<graphic xlink:href="345f03.jpg"/>
</fig>
<p>As mentioned, the interest rate and reserve rate cut will booth the economy and the stock market. Actually, the China government has cut the interest rate or reserve rate three times during 1st Jan to 31 July 2015 (on 28th Feb, 10th May, 27th June). As a result, most posts about the interest rate and reserve rate cut locate in the positive emotions such as “Happiness” and “Good” which is 85% of all related posts. The distribution of the posts related to SH-HK and SZ-HK connect is similar. However the distribution of the posts related to Greece government-debt crisis is different. As a lot of people is worried about the future of the European Union financial system and whether the Greece crisis will cause an international government debt crisis, there is almost 27% posts is classified as “Disgust” emotion. There are also a lot of posts locating in the “Good” category as the Greece government-debt problem is solved in July. As the posts are mainly distributed in the three categories:“Happiness”, “Good”, “Disgust”, this paper will discuss the correlation between the stock market and these three emotions.</p>
<p>Figure <xref rid="x1-80023">3</xref> shows the two public mood time series after the data processing which are “Happiness” time series and “Good” time series for the topic “interest rate and reserve rate cut”. On 10th May, China central bank announced a new round of interest rate and reserve rate cut which is an event that may have a unique, significant and complex effect on the public mood. The result demonstrates that the time series generated through the approach successfully identifies the public emotional response to the announcement of interest rate cut. It has a significant but short-lived up tick in positive sentiment “Happiness” and “Good” on that day. Actually after the announcement, the sentiment score increases a lot sharply. In addition, before the announcement the sentiment score starts to climb up as there are some rumors appearing in the market which mentioned that the government will cut the interest rate soon and the expectation of the new round interest rate cut becomes higher.</p>
<fig id="x1-80034">
<label>Fig. 4.</label>
<caption>
<p>The z-score of the Hang Seng Index delta price vs the z-score of the Happiness time series.</p>
</caption>
<graphic xlink:href="345f04.jpg"/>
</fig>
<p>To visualize the correlation between the public mood time series and the stock price movement, the data set for the topic “Shanghai-Hong Kong Stock Connect (SH-HK connect) and Shenzhen-Hong Kong Stock Connect (SZ-HK connect)” for the period 1st April to 31th May is used. The reason for using this time period is that on 8th April, Chinese investors, for the first time, used the entire 10.5 billion RMB daily quota in a cross-border programme buying Hong Kong stocks, boosting turnover under the Shanghai-Hong Kong Stock Connect to a record. The buying makes the Hang Seng Index increase 3.8 percent to its highest level in nearly seven years.</p>
<p>Two time series are plotted in Fig. <xref rid="x1-80034">4</xref>. First one is the daily close price of Hang Seng Index. And the Second one is the time series for the public mood state “Happiness”. To normalize and compare the value in these two time series, the z-score of their delta value is calculated. As shown in Fig. <xref rid="x1-80034">4</xref>, both time series frequently overlap or on the same direction. The movement of the two time series is similar according to their peaks and bottoms. On 8th April, both time series climb to the peak with almost the same z-score which is more than 1.5 standard deviations comparing to the average (Hang Seng Index: 1.74 and Happy Mood: 1.68). These behaviors of the time series demonstrate that the “Happiness” time series generated by the approach correctly indicate the public mood change to some positive events and is related with the Hang Seng Index price. The correlation between these two time series will be discussed in more details in next section.</p>
</sec>
<sec id="x1-90006">
<label>6.</label>
<title>Granger causality analysis</title>
<p>Many economic time series are unit root process [<xref ref-type="bibr" rid="ref025">25</xref>]. Therefore, the Augmented Dickey-Fuller (ADF) test is applied to each time series for identifying the unit root process. As the generated time series are proved to be unit root process, first differencing method is used to minimize the chance of spurious relationships. The public mood time series that is not covariance stationary is converted to <inline-formula><mml:math id="math005">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula>,
<disp-formula>
<mml:math display="block" id="math006">
<mml:mtable displaystyle="true"><mml:mlabeledtr>
<mml:mtd>
<mml:mtext>(2)</mml:mtext>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">Δ</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mlabeledtr></mml:mtable></mml:math>
</disp-formula>
</p>
<p>To avoid the effect of various magnitudes, the public mood time series <inline-formula><mml:math id="math007">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> is normalized by computing the z-scores:
<disp-formula>
<mml:math display="block" id="math008">
<mml:mtable displaystyle="true"><mml:mlabeledtr>
<mml:mtd>
<mml:mtext>(3)</mml:mtext>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo><mml:mstyle displaystyle="true">
<mml:mfrac>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>−</mml:mo><mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">¯</mml:mo></mml:mover>
</mml:mrow>
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi mathvariant="italic">σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfrac>
</mml:mstyle>
</mml:mtd>
</mml:mlabeledtr></mml:mtable></mml:math>
</disp-formula>
</p>
<p>Where <inline-formula><mml:math id="math009"><mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math></inline-formula> is the average of <inline-formula><mml:math id="math010">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> and <inline-formula><mml:math id="math011">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> is the standard deviation of <inline-formula><mml:math id="math012">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> over time period <italic>n</italic>. In the experiment, as it will discuss the moving average of stock price, <italic>n</italic> is defined as 10 and <inline-formula><mml:math id="math013"><mml:mover accent="true">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">D</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo stretchy="false">¯</mml:mo></mml:mover></mml:math></inline-formula> is the 10 days moving average of the sentiment score change.</p>
<p>The time series of stock prices is defined to reflect the daily changes in stock market value, i.e, the delta value between the close price of a stock in day <italic>t</italic> and day <inline-formula><mml:math id="math014">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:math></inline-formula>. Z-score computing is also applied to the delta value which helps to standardize the stock price change. <inline-formula><mml:math id="math015">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> equals to the z-score value of the delta. After establishing the public mood time series, to discuss whether variations of the public mood states affect the price in the stock market, this paper will apply the Granger causality analysis to the time series generated by the new Chinese sentiment analysis method, <inline-formula><mml:math id="math016">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula>, and the stock close price time series, <inline-formula><mml:math id="math017">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula>. This paper performs the Granger causality analysis according to the linear model as shown in Eq. (<xref rid="x1-90034">4</xref>) for the period of time between 1st Jan to 31st July 2015. As there are three topics to discuss and these topics are related to different stocks, different linear regression experiments between specific stock price time series and the public mood time series will be conducted.
<disp-formula>
<mml:math display="block" id="math018">
<mml:mtable displaystyle="true"><mml:mlabeledtr>
<mml:mtd id="x1-90034">
<mml:mtext>(4)</mml:mtext>
</mml:mtd>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="italic">α</mml:mi>
<mml:mo>+</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">β</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:munderover accentunder="false" accent="false">
<mml:mrow>
<mml:mstyle displaystyle="true">
<mml:mo largeop="true" movablelimits="false">∑</mml:mo></mml:mstyle>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">γ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">ε</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mlabeledtr></mml:mtable></mml:math>
</disp-formula>
</p>
<p>Where <inline-formula><mml:math id="math019">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">P</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> is the z-score of the stock day close price difference between day <italic>t</italic> and <inline-formula><mml:math id="math020">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn></mml:math></inline-formula>, <inline-formula><mml:math id="math021">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub></mml:math></inline-formula> is the z-score of the sentiment score change of a specific public mood dimension or emotion such as “Happiness”. The value of <italic>n</italic> is set to 7, as previous research [<xref ref-type="bibr" rid="ref003">3</xref>] has demonstrated that public mood on day <inline-formula><mml:math id="math022">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn></mml:math></inline-formula> has the highest correlation with stock price on day <italic>t</italic>.</p>
<table-wrap id="x1-90041">
<label>Table 1</label>
<caption>
<p>The linear regression analysis result between the “Happiness” time series and the close price movement of Shanghai Connect Index</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Lag</td>
<td valign="top" align="center">Std Error</td>
<td valign="top" align="center">t stat</td>
<td valign="top" align="center">P-value</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1 Day</td>
<td valign="top" align="center">0.096308</td>
<td valign="top" align="char" char=".">−0.646502</td>
<td valign="top" align="center">0.519301</td>
</tr>
<tr>
<td valign="top" align="left">2 Day</td>
<td valign="top" align="center">0.098495</td>
<td valign="top" align="char" char=".">−1.300428</td>
<td valign="top" align="center">0.196172</td>
</tr>
<tr>
<td valign="top" align="left">3 Day</td>
<td valign="top" align="center">0.101275</td>
<td valign="top" align="char" char=".">−2.219358</td>
<td valign="top" align="center">0.028515</td>
</tr>
<tr>
<td valign="top" align="left">4 Day</td>
<td valign="top" align="center">0.103179</td>
<td valign="top" align="char" char=".">−0.251972</td>
<td valign="top" align="center">0.801533</td>
</tr>
<tr>
<td valign="top" align="left">5 Day</td>
<td valign="top" align="center">0.101925</td>
<td valign="top" align="char" char=".">−0.225403</td>
<td valign="top" align="center">0.822084</td>
</tr>
<tr>
<td valign="top" align="left">6 Day</td>
<td valign="top" align="center">0.098601</td>
<td valign="top" align="char" char=".">0.961458</td>
<td valign="top" align="center">0.338431</td>
</tr>
<tr>
<td valign="top" align="left">7 Day</td>
<td valign="top" align="center">0.093337</td>
<td valign="top" align="char" char=".">1.237520</td>
<td valign="top" align="center">0.218529</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>First the topic “SH-HK and SZ-HK connect” is discussed. Table <xref rid="x1-90041">1</xref> presents the linear regression result between the time series for the close price of Shanghai Connect Index (stock code: 000159) and the time series of the emotion “Happiness” related to the topic “SH-HK and SZ-HK connect”. Shanghai Connect Index is the composite index including all stocks that can be traded through the SH-HK connect by international traders in Hong Kong. As a result, the price movement of the Shanghai Connect Index is directly affected by traders’ opinions or emotions about SH-HK connect. The result shows that the lag value of <inline-formula><mml:math id="math023">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> has a P-value equal to 0.028515 which is lower than the significance level 0.05. Based on the result of the Granger causality, the null hypothesis that the mood series do not predict the price movement of Shanghai Connect Index can be rejected, i.e. <inline-formula><mml:math id="math024">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">γ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>3</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>7</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo stretchy="false">≠</mml:mo>
<mml:mn>0</mml:mn></mml:math></inline-formula>. However only the lag with 3 days has significant causal relations with price movement of the Index which is similar as Bollen’s research [<xref ref-type="bibr" rid="ref003">3</xref>] on Twitter post. In his research, he mentioned changes of public mood match shifts in the stock price that occur 3–4 day later. This behavior can be explain as that it may take some time for the public mood in social media websites to gain a momentum that will affect investors decision in the real market. Other 6 emotions time series are not significantly correlated with the time series of Shanghai Connect Index according to the p-value.</p>
<p>To further demonstrate the forecast ability of the “Happiness” time series to other stocks, the p-value of the Granger causality correlation with 2 more stocks are calculated as well. The 2 stocks are HSI and the Hong Kong Exchanges and Clearing Limited corporation stock (stock code is 388). They are highly related to the topic “SH-HK and SZ-HK connect”. HSI is the composite Index of Hong Kong stocks which will be affected by the SH-HK connect as more capitals can be invested on Hong Kong market. Hong Kong Exchanges and Clearing Limited Company is the company that provides the service of SH-HK connect. More trader are investigating through the SH-HK connect, more commissions it can receive. As it can gain more profits, the valuation of the stock is changed and the price will be increased as more investors will be attracted. The results are shown in Table <xref rid="x1-90052">2</xref>. According to the p-value 0.045231, the lag value of <inline-formula><mml:math id="math025">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> has significant correlation with the price movement of HSI. Similar as HSI, the price movement of 388 is significantly correlated with the lag value of <inline-formula><mml:math id="math026">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>4</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula>. The p-value is 0.010188 which is much lower than the significance level 0.05. The null hypothesis can be rejected and the “Happiness” time series is predictive of the stock price movement including Shanghai Connect Index, HSI and the stock of Hong Kong Exchanges and Clearing Limited.</p>
<table-wrap id="x1-90052">
<label>Table 2</label>
<caption>
<p>The p-value of linear regress analysis result between “Happiness” time series and 3 different stocks</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Lag</td>
<td valign="top" align="center">000159</td>
<td valign="top" align="center">388</td>
<td valign="top" align="center">HSI</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1 Day</td>
<td valign="top" align="center">0.519301</td>
<td valign="top" align="center">0.096123</td>
<td valign="top" align="center">0.538607</td>
</tr>
<tr>
<td valign="top" align="left">2 Day</td>
<td valign="top" align="center">0.196172</td>
<td valign="top" align="center">0.546172</td>
<td valign="top" align="center">0.428106</td>
</tr>
<tr>
<td valign="top" align="left">3 Day</td>
<td valign="top" align="center">0.028515</td>
<td valign="top" align="center">0.458417</td>
<td valign="top" align="center">0.045231</td>
</tr>
<tr>
<td valign="top" align="left">4 Day</td>
<td valign="top" align="center">0.801533</td>
<td valign="top" align="center">0.010188</td>
<td valign="top" align="center">0.155256</td>
</tr>
<tr>
<td valign="top" align="left">5 Day</td>
<td valign="top" align="center">0.822084</td>
<td valign="top" align="center">0.130202</td>
<td valign="top" align="center">0.198888</td>
</tr>
<tr>
<td valign="top" align="left">6 Day</td>
<td valign="top" align="center">0.338431</td>
<td valign="top" align="center">0.220351</td>
<td valign="top" align="center">0.500008</td>
</tr>
<tr>
<td valign="top" align="left">7 Day</td>
<td valign="top" align="center">0.218529</td>
<td valign="top" align="center">0.934022</td>
<td valign="top" align="center">0.301673</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To demonstrate the predictive ability of public mood states with different topics, the Granger causality analysis is also applied to other topics. For the topic “Greece Government-Debt Crisis”, the public mood time series other than “Disgust” is not significant correlated with the price movement of HSI. Only the lag value <inline-formula><mml:math id="math027">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> of the time series of “Disgust” has significant causal relation with the price movement of HSI considering the significance level 0.1. The experiment result is shown in Table <xref rid="x1-90063">3</xref>.</p>
<table-wrap id="x1-90063">
<label>Table 3</label>
<caption>
<p>The linear regress analysis result between “Disgust” time series about“Greece Government-Debt Crisis” and the close price of HSI</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Lag</td>
<td valign="top" align="center">Std Error</td>
<td valign="top" align="center">t stat</td>
<td valign="top" align="center">P-value</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1 Day</td>
<td valign="top" align="char" char=".">0.131315</td>
<td valign="top" align="char" char=".">−0.60719</td>
<td valign="top" align="char" char=".">0.546991</td>
</tr>
<tr>
<td valign="top" align="left">2 Day</td>
<td valign="top" align="char" char=".">0.146389</td>
<td valign="top" align="char" char=".">−1.86881</td>
<td valign="top" align="char" char=".">0.068634</td>
</tr>
<tr>
<td valign="top" align="left">3 Day</td>
<td valign="top" align="char" char=".">0.1709</td>
<td valign="top" align="char" char=".">0.635789</td>
<td valign="top" align="char" char=".">0.528363</td>
</tr>
<tr>
<td valign="top" align="left">4 Day</td>
<td valign="top" align="char" char=".">0.176888</td>
<td valign="top" align="char" char=".">1.476039</td>
<td valign="top" align="char" char=".">0.147392</td>
</tr>
<tr>
<td valign="top" align="left">5 Day</td>
<td valign="top" align="char" char=".">0.17898</td>
<td valign="top" align="char" char=".">1.233007</td>
<td valign="top" align="char" char=".">0.22443</td>
</tr>
<tr>
<td valign="top" align="left">6 Day</td>
<td valign="top" align="char" char=".">0.166583</td>
<td valign="top" align="char" char=".">0.024174</td>
<td valign="top" align="char" char=".">0.980829</td>
</tr>
<tr>
<td valign="top" align="left">7 Day</td>
<td valign="top" align="char" char=".">0.146273</td>
<td valign="top" align="char" char=".">−0.21275</td>
<td valign="top" align="char" char=".">0.832549</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="x1-90074">
<label>Table 4</label>
<caption>
<p>The linear regress analysis result between public mood states related to “interest rate cut” and the close price of SSECI</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Lag</td>
<td valign="top" align="center">Happiness</td>
<td valign="top" align="center">Good</td>
<td valign="top" align="center">Disgust</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1 Day</td>
<td valign="top" align="char" char=".">0.254616</td>
<td valign="top" align="char" char=".">0.515584</td>
<td valign="top" align="center">0.395321</td>
</tr>
<tr>
<td valign="top" align="left">2 Day</td>
<td valign="top" align="char" char=".">0.111199</td>
<td valign="top" align="char" char=".">0.455075</td>
<td valign="top" align="center">0.041898</td>
</tr>
<tr>
<td valign="top" align="left">3 Day</td>
<td valign="top" align="char" char=".">0.007864</td>
<td valign="top" align="char" char=".">0.0676</td>
<td valign="top" align="center">0.008356</td>
</tr>
<tr>
<td valign="top" align="left">4 Day</td>
<td valign="top" align="char" char=".">0.944177</td>
<td valign="top" align="char" char=".">0.201313</td>
<td valign="top" align="center">0.959366</td>
</tr>
<tr>
<td valign="top" align="left">5 Day</td>
<td valign="top" align="char" char=".">0.858916</td>
<td valign="top" align="char" char=".">0.925238</td>
<td valign="top" align="center">0.995641</td>
</tr>
<tr>
<td valign="top" align="left">6 Day</td>
<td valign="top" align="char" char=".">0.8697</td>
<td valign="top" align="char" char=".">0.555839</td>
<td valign="top" align="center">0.193032</td>
</tr>
<tr>
<td valign="top" align="left">7 Day</td>
<td valign="top" align="char" char=".">0.355159</td>
<td valign="top" align="char" char=".">0.069943</td>
<td valign="top" align="center">0.384485</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For the topic “interest rate and reserve rate cut”, as it is a method that China government used to booth the China stock market, the paper will discuss the correlation between the public mood time series and the close price of Shanghai Stock Exchange Composite Index (SSECI). The results are shown in Table <xref rid="x1-90074">4</xref>. According to the p-value (0.007864), the lag value of <inline-formula><mml:math id="math028">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> for emotion “Happiness” is highly correlated with the price movement of SSECI considering the significance level 0.05. And for the “Good” time series, considering the significance level 0.1 and p-value 0.0676, the lag value of <inline-formula><mml:math id="math029">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">Z</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> is also correlated. In addition, the lag value of <inline-formula><mml:math id="math030">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>2</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="math031">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn></mml:math></inline-formula> of the “Disgust” time series have significant causal relationship with the SSECI value as well as the p-value is 0.041898 and 0.008356. The experiments are not testing the actual causation but whether one time series has predictive information about the price movement or not. The result shows that the null hypothesis that the stock price movement is not correlated with the change of public mood extracted from Weibo can be rejected with high level confidence and the time series of “Happiness”, “Good” and “Disgust” could be used to predict the price movement of SSECI. The reason why only “Happiness”, “Good”, “Disgust” are correlated with the stock price is because of the characteristics of Weibo. Most of the post are related and classified to these 3 dimensions through the proposed new Chinese sentiment analysis model. Another reason is that investors’ decision is more likely be affected by strong emotions which has high polarity such as “Happiness” and “Disgust”. In addition, the spread of these strong emotions such as “Happiness” in the social media websites is more quickly as it will attract more attentions than other kinds of emotions. As a result, there will be more posts with the same strong emotions.</p>
</sec>
<sec id="x1-100007">
<label>7.</label>
<title>A nonlinear autoregressive model with exogenous sentiment inputs for stock prediction</title>
<p>Although Granger causality analysis result has proved that certain public mood dimensions are correlated with the specific stock price movement in China, the analysis is based on linear regression. However the relation between public mood and the stock market is more likely to be non-linear. The ability of artificial neural network in solving the non-linear time series modeling issue has already been proved [<xref ref-type="bibr" rid="ref016">16</xref>,<xref ref-type="bibr" rid="ref018">18</xref>]. Some researchers have already applied the neural network to the topic of stock price prediction [<xref ref-type="bibr" rid="ref017">17</xref>]. Zhu et al. indicate that neural network models augmented with trading volumes leads to improvements in forecasting performance under different terms of forecasting horizon [<xref ref-type="bibr" rid="ref036">36</xref>]. In addition, some research has been done to forecast the exchange rate using some specific type of neural network such as self-organizing dynamic neural network [<xref ref-type="bibr" rid="ref019">19</xref>]. In previous literatures [<xref ref-type="bibr" rid="ref036">36</xref>] trading volume is considered as an augmentation to improve the performance of neural networks. In this paper, whether the public mood states from Weibo could reinforce the forecasting ability of neural network will be discussed.</p>
<sec id="x1-110007.1">
<label>7.1.</label>
<title>Model set up</title>
<fig id="x1-110015">
<label>Fig. 5.</label>
<caption>
<p>The nonlinear autoregressive model with exogenous sentiment inputs.</p>
</caption>
<graphic xlink:href="345f05.jpg"/>
</fig>
<p>The overview of the nonlinear autoregressive model with exogenous sentiment inputs is shown in Fig. <xref rid="x1-110015">5</xref>. Two data sets are required in the model. one is the historical stock price which can be downloaded from the Internet, another one is the public posts set from Sina Weibo. As the purpose of this model is to understand the moving direction of the stock price and predict the future trend, the raw price of a stock will be first transformed to the 5 days moving average of the stock price. Moving average is widely used in previous research to predict the trend of the stock price movement [<xref ref-type="bibr" rid="ref004">4</xref>,<xref ref-type="bibr" rid="ref015">15</xref>,<xref ref-type="bibr" rid="ref028">28</xref>]. As a result, the first input time series is generated based on 5 days moving average, for example, <inline-formula><mml:math id="math032">
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math></inline-formula> where <italic>t</italic> is the total number of days for the testing period.</p>
<p>Another part of the model is the topic based sentiment score time series generation. Through the Granger causality analysis result in last section, the stock price is proved to be related with the mood indicators for some specific topic. According to the efficient market hypothesis theory, stock price is driven by new information. The new information in the proposed model here is considered as topic related. The new information for the specific topics which are related to a stock and the reactions of the public drive the price movement. To generate the sentiment score inputs, the first step is to find out all related topics about a stock. For example, the topic “Shanghai Hong Kong Connect and Shenzhen Hong Kong connect” is highly related to the Shanghai Stock Connect Index (000159). In addition, for each topic, the correlated keyword list <inline-formula><mml:math id="math033">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">m</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math></inline-formula> will be extracted from Sina Weibo. The model then use these keywords to filter the unnecessary post from Sina Weibo. After that, each topic will have an associated data set of posts to generate the topic based sentiment score. Finally the new Chinese sentiment analysis method presented in Section <xref rid="x1-70004">4</xref> is applied to calculate the daily sentiment score for each topic and generate the sentiment score time series for different mood dimensions.</p>
<table-wrap id="x1-110025">
<label>Table 5</label>
<caption>
<p>Stock price prediction result</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Dataset</td>
<td valign="top" align="center" colspan="2">000159</td>
<td valign="top" align="center" colspan="2">388</td>
<td valign="top" align="center" colspan="2">Hang Seng Index</td>
</tr>
<tr>
<td valign="top"><hr/></td>
<td valign="top" colspan="2"><hr/></td>
<td valign="top" colspan="2"><hr/></td>
<td valign="top" colspan="2"><hr/></td>
</tr>
<tr>
<td valign="top" align="left">Models</td>
<td valign="top" align="center"><inline-formula><mml:math id="math034">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="math035">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="math036">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="math037">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="math038">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="math039">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">MSE</td>
<td valign="top" align="center">286.2919</td>
<td valign="top" align="center">460.3728</td>
<td valign="top" align="center">0.5796</td>
<td valign="top" align="center">0.9306</td>
<td valign="top" align="center">3646</td>
<td valign="top" align="center">6755.2</td>
</tr>
<tr>
<td valign="top" align="left">MAPE</td>
<td valign="top" align="center">0.0039</td>
<td valign="top" align="center">0.0064</td>
<td valign="top" align="center">0.0034</td>
<td valign="top" align="center">0.0044</td>
<td valign="top" align="center">0.0024</td>
<td valign="top" align="center">0.0033</td>
</tr>
<tr>
<td valign="top" align="left">Direction (%)</td>
<td valign="top" align="center">89.66</td>
<td valign="top" align="center">75.86</td>
<td valign="top" align="center">77.14</td>
<td valign="top" align="center">62.86</td>
<td valign="top" align="center">87.1</td>
<td valign="top" align="center">70.9</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The sentiment score will be normalized to the z-score as what have been done in Section <xref rid="x1-90006">6</xref>. Through Granger causality analysis, unrelated mood dimensions will be filtered out, and only useful time series will be remained. For example for Shanghai Stock Connect Index (000159), through the above Granger analysis, “Happiness” time series about “Shanghai Hong Kong stock connect and Shenzhen Hong Kong stock connect” are proved to be related with the stock price. Accordingly it could be considered as an exogenous sentiment input in the proposed model. Other mood dimensions such as “disgust” will be ignored. Through the proposed Chinese sentiment analysis method in Section <xref rid="x1-70004">4</xref>, for each topic, 7 time series are generated based on the 7 mood dimensions. However, not every dimension could be used. It is possible that all 7 dimensions are proved to be not related. So as shown in Fig. <xref rid="x1-110015">5</xref>, the number of sentiment inputs, <italic>i</italic>, is not equal to the number of topics, <italic>n</italic>, associated with the stock. After getting all the related sentiment score time series, the proposed model will consider them together with the moving average time series as the model input and use a neural network to predict the future price.</p>
<p>A back propagation feed forward neural network with multi hidden layers is implemented to build up the nonlinear autoregressive exogenous model. To build up a unbiased comparison of model performance, the same parameter value of the neural network is maintained. To predict the moving average price of a stock on day <italic>t</italic> the historical price of the stock in the past <italic>n</italic> days and the past value of specific public mood time series are considered as input attributes. As the Granger analysis result in last section suggests that the most related time lag value is <inline-formula><mml:math id="math040">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>3</mml:mn></mml:math></inline-formula> or <inline-formula><mml:math id="math041">
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>4</mml:mn></mml:math></inline-formula>. In the neural network model <italic>n</italic> is defined as 5. For each hidden layer, the number of neuron is defined as 10 in the model. For training, Levenberg-Marquardt algorithm is adopted as it is considered as a generic method to solve non-linear least squares problems. Mean square error (MSE) is used to monitor and compare the performance of the model. The model has one input layer, one output layer and multiple hidden layers. As shown in Fig. <xref rid="x1-110015">5</xref>, the inputs of the neural network are the moving average price time series and the different topic-based sentiment score time series, e.g. <inline-formula><mml:math id="math042">
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">x</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">i</mml:mi>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mi mathvariant="italic">n</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo></mml:math></inline-formula>, and the output is the moving average price in the next step and the direction of the price movement (up or down). This model is topic-based and could be easily extended by adding new topics or sentiment inputs. As every topic has a life cycle, there is no fixed number of topics in the model. The topics list related to a stock is changed everyday. The model is self-adaptive. The topic which is not related to the stock could be removed. Or if a new information or news is announced, there could be another topic that can be included. This paper will not discuss how the topics list could be built. It is only interested in whether the model with exogenous sentiment inputs is useful or not.</p>
<fig id="x1-110036">
<label>Fig. 6.</label>
<caption>
<p>Model outputs comparing to the expect values for 388.</p>
</caption>
<graphic xlink:href="345f06.jpg"/>
</fig>
</sec>
<sec id="x1-120007.2">
<label>7.2.</label>
<title>Experimental results</title>
<p>For testing of the nonlinear autoregressive model with exogenous sentiment inputs, the training data set used is the collection of Weibo posts from 1st Jan 2015 to 31 Dec 2015. And the testing period is from Jan 2016 to March 2016. As discussed in Section <xref rid="x1-90006">6</xref>, according to the Granger causality analysis result, the “Happiness” mood dimension time series derived from the topic “Shanghai Hong Kong stock connect and Shenzhen Hong Kong stock connect” are highly related to the stock price of 000159, 388 and HSI. Other time series are not related. As a result, in this experiment, only the “Happiness” time series is involved as an exogenous sentiment input for predicting the price movement of 000159,388 and HSI. Table <xref rid="x1-110025">5</xref> is the testing result of the model. To demonstrate the ability of sentiment inputs, two nonlinear autoregressive models are applied to the same data set, one has the “Happiness” sentiment score time series as an input, another doesn’t. Through the result, it is clear that by adding sentiment time series as an input, the performance is better. The nonlinear models used in the experiment are shown in Eq. (<xref rid="x1-120015">5</xref>). <inline-formula><mml:math id="math043">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> indicates the normal autoregressive model. <inline-formula><mml:math id="math044">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> represents the proposed model with exogenous sentiment inputs. In the experiment, only the “Happiness” time series associated with the topic “Shanghai Hong Kong stock connect and Shenzhen Hong Kong stock connect”, <inline-formula><mml:math id="math045">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula> is consider as the sentiment input. If more related topics and the corresponding sentiment time series could be detected, the performance may be better. However in this experiment, the target is to discuss the ability of the new model and whether the prediction result could by improved by adding the sentiment score time series, finding out all the related topics for a stock is not the concern of this paper. 
<disp-formula>
<mml:math display="block" id="math046">
<mml:mtable displaystyle="true"><mml:mlabeledtr>
<mml:mtd id="x1-120015">
<mml:mtext>(5)</mml:mtext>
</mml:mtd>
<mml:mtd>
<mml:mtable displaystyle="true" columnspacing="0pt 10pt 0pt 10pt 0pt 10pt 0pt 10pt 0pt 10pt">
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>159</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>388</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mn>388</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">HSI</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd columnalign="right"/>
<mml:mtd columnalign="left">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo fence="true" stretchy="false">{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">HSI</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">X</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>5</mml:mn>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mo>…</mml:mo>
<mml:mo mathvariant="normal">,</mml:mo>
<mml:mi mathvariant="italic">t</mml:mi>
<mml:mo>−</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo fence="true" stretchy="false">}</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mtd>
</mml:mlabeledtr></mml:mtable></mml:math>
</disp-formula>
</p>
<p>As shown in Table <xref rid="x1-110025">5</xref>, for the Shanghai stock connect index (000159), the mean square error for the testing period is 460.3728 using the general nonlinear autoregressive model as <inline-formula><mml:math id="math047">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula>. Comparing to that, the mean square error is decreased to 286.1919 by adding the “Happiness” time series as another input. Similar as 000159, the mean square error for the stock 388 and Hang Seng Index is decreased as well, from 0.9306 to 0.5796 for 388 and from 6755.2 to 3646 for HSI. According to the result, it can be concluded that by adding one sentiment time series as an input, the prediction performance is improved. For investment decision making, it is important to know the future direction of a stock. Moving average represents the moving direction of a stock in a certain period. To demonstrate the ability of predicting the price movement direction for the proposed model, the directions determined by the model (daily price up or down) are compared to the exact directions representing in the history data of a stock. The accuracy of direction prediction is high according to Table <xref rid="x1-110025">5</xref>. The highest accuracy is for 000159 which is 89.66 percent. The directions are not matched only for a few days. Same as the mean square error, the accuracy is increased by adding the sentiment time series as inputs, from 75.86 percent to 89.66 percent for 000159, from 62.86 to 77.14 percent for 388 and from 70.9 to 87.1 for Hang Seng Index.</p>
<p>Looking into the details of the prediction results, Table <xref rid="x1-120026">6</xref> has shown the precision, recall and F1 value for predicting up directions based on the proposed model. After adding the sentiment time series into the inputs, the precision, recall and F1 value are all improved for the Shanghai Stock Connect Index (000159). F1 value is increased from 0.740741 to 0,888889. The result indicates that the ability of the new model in finding out the correct up directions and avoid the incorrect classification of stock price movement directions is much better than the normal nonlinear autoregressive model without the exogenous sentiment inputs.</p>
<table-wrap id="x1-120026">
<label>Table 6</label>
<caption>
<p>Predicting the positive moving direction for 000159</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left"/>
<td valign="top" align="center"><inline-formula><mml:math id="math048">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
<td valign="top" align="center"><inline-formula><mml:math id="math049">
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="italic">I</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>0</mml:mn>
</mml:mrow>
</mml:msub></mml:math></inline-formula></td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Precision</td>
<td valign="top" align="center">0.923077</td>
<td valign="top" align="center">0.769231</td>
</tr>
<tr>
<td valign="top" align="left">Recall</td>
<td valign="top" align="center">0.857143</td>
<td valign="top" align="center">0.714286</td>
</tr>
<tr>
<td valign="top" align="left">F1 value</td>
<td valign="top" align="center">0.888889</td>
<td valign="top" align="center">0.740741</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To visualize the performance of the model, Fig. <xref rid="x1-110036">6</xref> is presented. Figure <xref rid="x1-110036">6</xref> compares the prediction output of the proposed model and the expected or target value for the stock 388. According to the figure, the two time series, the output price and the target price, have almost the same curve and moving directions. The error or the difference between the output price and the target price is quite small which demonstrates the prediction ability of the proposed model. Most of the error locate in the range −2 to 2, only a few exceed the range and the largest error is around 5 only.</p>
</sec>
</sec>
<sec id="x1-130008">
<label>8.</label>
<title>Conclusion</title>
<p>As the stock market in China has already been one of the largest market in the world, it is important to understand its behavior according to the public mood states. This paper investigates whether public mood states derived from large-scale collection of Weibo posts on weibo.com is correlated or predictive of the price of different stocks in China market. The results demonstrate the public mood states can be indeed tracked using the proposed new Chinese sentiment analysis method on the large-scale Weibo posts. In addition, the Granger causality analysis results indicate that the null hypothesis that the public mood is not correlated with the stock price can be rejected and the time series of public mood states is predictive of the price movement of different stocks. Among the 7 observed emotions, only some are Granger causative of the stock price movement. The emotions “Happiness” and “Disgust” have significant causative relationships with different stock prices. Movement of these two emotions is correlated with the price movement which occurs in 3–4 days. These 3 days time lag is also discussed in the research of using public mood in Twitter to predict stock price [<xref ref-type="bibr" rid="ref003">3</xref>]. A nonlinear autoregressive model with exogenous sentiment inputs furthermore demonstrates the possibility of using public mood states to predict the stock price movement in China. The public mood states help to improve the prediction accuracy even using the basic neural network model. Given the performance, if more related topics for a stock are found out, the accuracy could be higher considering more completed topic-based sentiment inputs. Regarding the model, if more sophisticated and complicated model is implemented according to the topic-based mood, it is possible that the prediction result would be even better.</p>
<p>In the future, more experiments will be conducted to understand why the time lag exists and how to avoid and use it for real time algorithm trading. Other important factors will be examined in future research such as the geo-location effect and spam detection. In addition, in this paper, it only discusses the relationship between the close price of stocks and the daily movement of public mood. How the intraday public mood change affects the real-time stock price movement will be discussed in future research as well.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgement</title>
<p>This paper is an extended version from the conference paper “Weibo Mood towards Stock Market” in SeCoP 2016. Supported by Tip-top Scientific and Technical Innovative Youth Talents of Guangdong special support program (No. 2015TQ01X633), (STPP-GD) (Grant No. 2015A070711001) and (SF-STRDA-GD) (Grant No. 2016B010124011).</p></ack>
<ref-list>
<title>References</title>
<ref id="ref001">
<label>[1]</label><mixed-citation publication-type="other"><string-name><given-names>A.</given-names> <surname>Abbasi</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Chen</surname></string-name>, Cybergate: A design framework and system for text analysis of computer-mediated communication, <italic>MIS Quarterly</italic> (2008), 811–837. </mixed-citation>
</ref>
<ref id="ref002">
<label>[2]</label><mixed-citation publication-type="other"><string-name><given-names>K.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Cheng</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Almas</surname></string-name>, <chapter-title>Multi-lingual sentiment analysis of financial news streams</chapter-title>, in: <source>Proceedings of the 1st International Conference on Grid in Finance</source>, <year>2006</year>. </mixed-citation>
</ref>
<ref id="ref003">
<label>[3]</label><mixed-citation publication-type="journal"><string-name><given-names>J.</given-names> <surname>Bollen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Mao</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Zeng</surname></string-name>, <article-title>Twitter mood predicts the stock market</article-title>, <source>Computational Science</source> <volume>2</volume>(<issue>1</issue>) (<year>2011</year>), <fpage>1</fpage>–<lpage>8</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.jocs.2010.12.007" xlink:type="simple">10.1016/j.jocs.2010.12.007</ext-link>. </mixed-citation>
</ref>
<ref id="ref004">
<label>[4]</label><mixed-citation publication-type="journal"><string-name><given-names>W.</given-names> <surname>Brock</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Lakonishok</surname></string-name> and <string-name><given-names>B.</given-names> <surname>LeBaron</surname></string-name>, <article-title>Simple technical trading rules and the stochastic properties of stock returns</article-title>, <source>The Journal of Finance</source> <volume>47</volume>(<issue>5</issue>) (<year>1992</year>), <fpage>1731</fpage>–<lpage>1764</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1111/j.1540-6261.1992.tb04681.x" xlink:type="simple">10.1111/j.1540-6261.1992.tb04681.x</ext-link>. </mixed-citation>
</ref>
<ref id="ref005">
<label>[5]</label><mixed-citation publication-type="journal"><string-name><given-names>M.</given-names> <surname>Chau</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Xu</surname></string-name>, <article-title>Mining communities and their relationships in blogs: A study of hate groups</article-title>, <source>International Journal of Human-Computer Studies</source> <volume>65</volume>(<issue>1</issue>) (<year>2007</year>), <fpage>57</fpage>–<lpage>70</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.ijhcs.2006.08.009" xlink:type="simple">10.1016/j.ijhcs.2006.08.009</ext-link>. </mixed-citation>
</ref>
<ref id="ref006">
<label>[6]</label><mixed-citation publication-type="other"><string-name><given-names>J.M.</given-names> <surname>Chen</surname></string-name>, The construction and application of Chinese emotion word ontology, Master thesis, Dailian University of Technology, 2008. </mixed-citation>
</ref>
<ref id="ref007">
<label>[7]</label><mixed-citation publication-type="journal"><string-name><given-names>R.J.</given-names> <surname>Dolan</surname></string-name>, <article-title>Emotion, cognition, and behavior</article-title>, <source>Science</source> <volume>298</volume>(<issue>5596</issue>) (<year>2002</year>), <fpage>1191</fpage>–<lpage>1194</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1126/science.1076358" xlink:type="simple">10.1126/science.1076358</ext-link>. </mixed-citation>
</ref>
<ref id="ref008">
<label>[8]</label><mixed-citation publication-type="journal"><string-name><given-names>E.F.</given-names> <surname>Fama</surname></string-name>, <article-title>The behavior of stock-market prices</article-title>, <source>The Journal of Business</source> <volume>38</volume>(<issue>1</issue>) (<year>1965</year>), <fpage>34</fpage>–<lpage>105</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1086/294743" xlink:type="simple">10.1086/294743</ext-link>. </mixed-citation>
</ref>
<ref id="ref009">
<label>[9]</label><mixed-citation publication-type="journal"><string-name><given-names>E.</given-names> <surname>Fama</surname></string-name>, <article-title>The behavior of stock-market prices</article-title>, <source>Business</source> <volume>38</volume>(<issue>1</issue>) (<year>1965</year>), <fpage>34</fpage>–<lpage>105</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1086/294743" xlink:type="simple">10.1086/294743</ext-link>. </mixed-citation>
</ref>
<ref id="ref010">
<label>[10]</label><mixed-citation publication-type="journal"><string-name><given-names>E.F.</given-names> <surname>Fama</surname></string-name>, <article-title>Efficient capital markets: II</article-title>, <source>The Journal of Finance</source> <volume>46</volume>(<issue>5</issue>) (<year>1991</year>), <fpage>1575</fpage>–<lpage>1617</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1111/j.1540-6261.1991.tb04636.x" xlink:type="simple">10.1111/j.1540-6261.1991.tb04636.x</ext-link>. </mixed-citation>
</ref>
<ref id="ref011">
<label>[11]</label><mixed-citation publication-type="other"><string-name><given-names>R.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Xu</surname></string-name>, <article-title>Anger is more influential than joy: Sentiment correlation in weibo</article-title>, <source>PloS One</source> <volume>9</volume>(<issue>10</issue>) (<year>2014</year>), <elocation-id>e110184</elocation-id>. doi:<ext-link ext-link-type="doi" xlink:href="10.1371/journal.pone.0110184" xlink:type="simple">10.1371/journal.pone.0110184</ext-link>. </mixed-citation>
</ref>
<ref id="ref012">
<label>[12]</label><mixed-citation publication-type="chapter"><string-name><given-names>Q.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Abel</surname></string-name>, <string-name><given-names>G.J.</given-names> <surname>Houben</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Yu</surname></string-name>, <chapter-title>A comparative study of users’ microblogging behavior on Sina Weibo and Twitter</chapter-title>, in: <source>User Modeling, Adaptation, and Personalization</source>, <publisher-name>Springer</publisher-name>, <year>2012</year>, pp. <fpage>88</fpage>–<lpage>101</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1007/978-3-642-31454-4_8" xlink:type="simple">10.1007/978-3-642-31454-4_8</ext-link>. </mixed-citation>
</ref>
<ref id="ref013">
<label>[13]</label><mixed-citation publication-type="chapter"><string-name><given-names>E.</given-names> <surname>Gilbert</surname></string-name> and <string-name><given-names>E.</given-names> <surname>Karahalio</surname></string-name>, <chapter-title>Widespread worry and the stock market</chapter-title>, in: <source>Proceedings of the International AAAI Conference on Weblogs and Social Media</source>, <year>2010</year>, pp. <fpage>59</fpage>–<lpage>65</lpage>. </mixed-citation>
</ref>
<ref id="ref014">
<label>[14]</label><mixed-citation publication-type="chapter"><string-name><given-names>D.</given-names> <surname>Gruhl</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Guha</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Novak</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Tomkins</surname></string-name>, <chapter-title>The predictive power of online chatter</chapter-title>, in: <source>Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining</source>, <year>2005</year>, pp. <fpage>78</fpage>–<lpage>87</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1145/1081870.1081883" xlink:type="simple">10.1145/1081870.1081883</ext-link>. </mixed-citation>
</ref>
<ref id="ref015">
<label>[15]</label><mixed-citation publication-type="journal"><string-name><given-names>A.</given-names> <surname>Gunasekarage</surname></string-name> and <string-name><given-names>D.M.</given-names> <surname>Power</surname></string-name>, <article-title>The profitability of moving average trading rules in South Asian stock markets</article-title>, <source>Emerging Markets Review</source> <volume>2</volume>(<issue>1</issue>) (<year>2001</year>), <fpage>17</fpage>–<lpage>33</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/S1566-0141(00)00017-0" xlink:type="simple">10.1016/S1566-0141(00)00017-0</ext-link>. </mixed-citation>
</ref>
<ref id="ref016">
<label>[16]</label><mixed-citation publication-type="journal"><string-name><given-names>K.</given-names> <surname>Huarng</surname></string-name> and <string-name><given-names>T.H.-K.</given-names> <surname>Yu</surname></string-name>, <article-title>The application of neural networks to forecast fuzzy time series</article-title>, <source>Physica A: Statistical Mechanics and Its Applications</source> <volume>363</volume>(<issue>2</issue>) (<year>2006</year>), <fpage>481</fpage>–<lpage>491</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.physa.2005.08.014" xlink:type="simple">10.1016/j.physa.2005.08.014</ext-link>. </mixed-citation>
</ref>
<ref id="ref017">
<label>[17]</label><mixed-citation publication-type="chapter"><string-name><given-names>T.</given-names> <surname>Kimoto</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Asakawa</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Yoda</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Takeoka</surname></string-name>, <chapter-title>Stock market prediction system with modular neural networks</chapter-title>, in: <source>Proceedings of the International Joint Conference on Neural Networks</source>, <year>1990</year>, pp. <fpage>1</fpage>–<lpage>6</lpage>. </mixed-citation>
</ref>
<ref id="ref018">
<label>[18]</label><mixed-citation publication-type="other"><string-name><given-names>A.</given-names> <surname>Lapedes</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Farber</surname></string-name>, Nonlinear signal processing using neural networks: Prediction and system modelling, Technical Report LA–UR–7–2662. </mixed-citation>
</ref>
<ref id="ref019">
<label>[19]</label><mixed-citation publication-type="journal"><string-name><given-names>G.</given-names> <surname>Leng</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Prasad</surname></string-name> and <string-name><given-names>T.M.</given-names> <surname>McGinnity</surname></string-name>, <article-title>An on-line algorithm for creating self-organizing fuzzy neural networks</article-title>, <source>Neural Networks</source> <volume>17</volume>(<issue>10</issue>) (<year>2004</year>), <fpage>1477</fpage>–<lpage>1493</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.neunet.2004.07.009" xlink:type="simple">10.1016/j.neunet.2004.07.009</ext-link>. </mixed-citation>
</ref>
<ref id="ref020">
<label>[20]</label><mixed-citation publication-type="journal"><string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Deng</surname></string-name>, <article-title>News impact on stock price return via sentiment analysis</article-title>, <source>Knowledge-Based Systems</source> <volume>69</volume> (<year>2014</year>), <fpage>14</fpage>–<lpage>23</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.knosys.2014.04.022" xlink:type="simple">10.1016/j.knosys.2014.04.022</ext-link>. </mixed-citation>
</ref>
<ref id="ref021">
<label>[21]</label><mixed-citation publication-type="journal"><string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Song</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>F.L.</given-names> <surname>Wang</surname></string-name>, <article-title>Does summarization help stock prediction? A news impact analysis</article-title>, <source>IEEE Intelligent Systems</source> <volume>30</volume>(<issue>3</issue>) (<year>2015</year>), <fpage>26</fpage>–<lpage>34</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1109/MIS.2015.1" xlink:type="simple">10.1109/MIS.2015.1</ext-link>. </mixed-citation>
</ref>
<ref id="ref022">
<label>[22]</label><mixed-citation publication-type="chapter"><string-name><given-names>H.</given-names> <surname>Liang</surname></string-name>, <string-name><given-names>F.S.</given-names> <surname>Tsai</surname></string-name> and <string-name><given-names>A.T.</given-names> <surname>Kwee</surname></string-name>, <chapter-title>Detecting novel business blogs</chapter-title>, in: <source>Proceedings of the 7th International Conference on Information</source>, <year>2009</year>, pp. <fpage>1</fpage>–<lpage>5</lpage>. </mixed-citation>
</ref>
<ref id="ref023">
<label>[23]</label><mixed-citation publication-type="chapter"><string-name><given-names>A.Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Gu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Konana</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Ghosh</surname></string-name>, <chapter-title>Predicting stock price from financial message boards with a mixture of experts framework</chapter-title>, in: <source>Intelligent Data Exploration &amp; Analysis Laboratory</source>, <year>2006</year>, pp. <fpage>1</fpage>–<lpage>14</lpage>. </mixed-citation>
</ref>
<ref id="ref024">
<label>[24]</label><mixed-citation publication-type="chapter"><string-name><given-names>G.</given-names> <surname>Mishne</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Glance</surname></string-name>, <chapter-title>Predicting movie sales from blogger sentiment</chapter-title>, in: <source>AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs</source>, <year>2006</year>, pp. <fpage>155</fpage>–<lpage>158</lpage>. </mixed-citation>
</ref>
<ref id="ref025">
<label>[25]</label><mixed-citation publication-type="journal"><string-name><given-names>C.R.</given-names> <surname>Nelson</surname></string-name> and <string-name><given-names>C.R.</given-names> <surname>Plosser</surname></string-name>, <article-title>Trends and random walks in macroeconmic time series: Some evidence and implications</article-title>, <source>Monetary Economics</source> <volume>10</volume> (<year>1982</year>), <fpage>139</fpage>–<lpage>162</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/0304-3932(82)90012-5" xlink:type="simple">10.1016/0304-3932(82)90012-5</ext-link>. </mixed-citation>
</ref>
<ref id="ref026">
<label>[26]</label><mixed-citation publication-type="journal"><string-name><given-names>J.R.</given-names> <surname>Nofsinger</surname></string-name>, <article-title>Social mood and financial economics</article-title>, <source>Behaviour Finance</source> <volume>6</volume>(<issue>3</issue>) (<year>2005</year>), <fpage>144</fpage>–<lpage>160</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1207/s15427579jpfm0603_4" xlink:type="simple">10.1207/s15427579jpfm0603_4</ext-link>. </mixed-citation>
</ref>
<ref id="ref027">
<label>[27]</label><mixed-citation publication-type="journal"><string-name><given-names>D.E.</given-names> <surname>O’Leary</surname></string-name>, <article-title>Blog mining-review and extensions: “From each according to his opinion”</article-title>, <source>Decision Support Systems</source> <volume>51</volume>(<issue>4</issue>) (<year>2011</year>), <fpage>821</fpage>–<lpage>830</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.dss.2011.01.016" xlink:type="simple">10.1016/j.dss.2011.01.016</ext-link>. </mixed-citation>
</ref>
<ref id="ref028">
<label>[28]</label><mixed-citation publication-type="journal"><string-name><given-names>P.F.</given-names> <surname>Pai</surname></string-name> and <string-name><given-names>C.S.</given-names> <surname>Lin</surname></string-name>, <article-title>A hybrid arima and support vector machines model in stock price forecasting</article-title>, <source>Omega</source> <volume>33</volume>(<issue>6</issue>) (<year>2005</year>), <fpage>497</fpage>–<lpage>505</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.omega.2004.07.024" xlink:type="simple">10.1016/j.omega.2004.07.024</ext-link>. </mixed-citation>
</ref>
<ref id="ref029">
<label>[29]</label><mixed-citation publication-type="journal"><string-name><given-names>B.</given-names> <surname>Qian</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Rasheed</surname></string-name>, <article-title>Stock market prediction with multiple classifiers</article-title>, <source>Applied Intelligence</source> <volume>26</volume>(<issue>1</issue>) (<year>2007</year>), <fpage>25</fpage>–<lpage>33</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1007/s10489-006-0001-7" xlink:type="simple">10.1007/s10489-006-0001-7</ext-link>. </mixed-citation>
</ref>
<ref id="ref030">
<label>[30]</label><mixed-citation publication-type="other"><string-name><given-names>Y.</given-names> <surname>Rao</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Jin</surname></string-name>, <string-name><given-names>F.L.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>Q.</given-names> <surname>Li</surname></string-name>, Social emotion classification of short text via topic-level maximum entropy model, <italic>Information &amp; Management</italic> (2016). </mixed-citation>
</ref>
<ref id="ref031">
<label>[31]</label><mixed-citation publication-type="journal"><string-name><given-names>R.P.</given-names> <surname>Schumaker</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Chen</surname></string-name>, <article-title>Textual analysis of stock market prediction using breaking financial news: The azfintext system</article-title>, <source>ACM Transactions on Information System</source> <volume>27</volume>(<issue>2</issue>) (<year>2009</year>), <fpage>1</fpage>–<lpage>19</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1145/1462198.1462204" xlink:type="simple">10.1145/1462198.1462204</ext-link>. </mixed-citation>
</ref>
<ref id="ref032">
<label>[32]</label><mixed-citation publication-type="journal"><string-name><given-names>J.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Cheng</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Fu</surname></string-name>, <article-title>Mining affective text to improve social media item recommendation</article-title>, <source>Information Processing and Management</source> <volume>51</volume> (<year>2015</year>), <fpage>444</fpage>–<lpage>457</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.ipm.2014.09.002" xlink:type="simple">10.1016/j.ipm.2014.09.002</ext-link>. </mixed-citation>
</ref>
<ref id="ref033">
<label>[33]</label><mixed-citation publication-type="chapter"><string-name><given-names>M.H.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>C.L.</given-names> <surname>Lei</surname></string-name>, <chapter-title>Boosting election prediction accuracy by crowd wisdom on social forums</chapter-title>, in: <source>13th IEEE Annual Consumer Communications &amp; Networking Conference</source>, <year>2016</year>, pp. <fpage>348</fpage>–<lpage>353</lpage>. </mixed-citation>
</ref>
<ref id="ref034">
<label>[34]</label><mixed-citation publication-type="chapter"><string-name><given-names>F.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Yang</surname></string-name>, <chapter-title>Automatic detection of rumor on Sina Weibo</chapter-title>, in: <source>Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics</source>, <publisher-name>ACM</publisher-name>, <year>2012</year>, p. <fpage>13</fpage>. </mixed-citation>
</ref>
<ref id="ref035">
<label>[35]</label><mixed-citation publication-type="journal"><string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Fuehres</surname></string-name> and <string-name><given-names>P.A.</given-names> <surname>Gloor</surname></string-name>, <article-title>Predicting stock market indicator through Twitter “I hope it is not as bad as I fear”</article-title>, <source>Procedia-Social and Behavioral Sciences</source> <volume>26</volume> (<year>2011</year>), <fpage>55</fpage>–<lpage>62</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.sbspro.2011.10.562" xlink:type="simple">10.1016/j.sbspro.2011.10.562</ext-link>. </mixed-citation>
</ref>
<ref id="ref036">
<label>[36]</label><mixed-citation publication-type="journal"><string-name><given-names>X.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Xu</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Li</surname></string-name>, <article-title>Predicting stock index increments by neural networks: The role of trading volume under different horizons</article-title>, <source>Expert Systems with Applications</source> <volume>34</volume>(<issue>4</issue>) (<year>2008</year>), <fpage>3043</fpage>–<lpage>3054</lpage>. doi:<ext-link ext-link-type="doi" xlink:href="10.1016/j.eswa.2007.06.023" xlink:type="simple">10.1016/j.eswa.2007.06.023</ext-link>. </mixed-citation>
</ref>
</ref-list>
</back>
</article>
