Conference item
How much is said in a microblog?: A multilingual inquiry based on Weibo and Twitter
- Abstract:
- This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese, and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying "how much can be said" in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Accepted manuscript, pdf, 573.1KB, Terms of use)
-
- Publisher copy:
- 10.1145/2786451.2786486
Authors
+ Economic and Social Research Council
More from this funder
- Funder identifier:
- https://ror.org/03n0ht308
- Publisher:
- Association for Computing Machinery
- Host title:
- WebSci '15: Proceedings of the ACM Web Science Conference
- Article number:
- 25
- Publication date:
- 2015-06-28
- Acceptance date:
- 2015-01-13
- Event title:
- 2015 ACM Web Science Conference, WebSci 2015
- Event location:
- Oxford, United Kingdom
- Event website:
- http://websci15.org
- Event start date:
- 2015-06-28
- Event end date:
- 2015-07-01
- DOI:
- ISBN:
- 978-1-4503-3672-7
- Language:
-
English
- Keywords:
- Pubs id:
-
pubs:611149
- UUID:
-
uuid:fab5b725-9aea-468d-b2d4-592d61d64468
- Local pid:
-
pubs:611149
- Source identifiers:
-
611149
- Deposit date:
-
2016-03-19
- ARK identifier:
Terms of use
- Copyright holder:
- Liao et al.
- Copyright date:
- 2015
- Rights statement:
- Copyright is held by the owner/author(s). Publication rights licensed to ACM.
- Notes:
- This is the accepted manuscript version of the paper. The final version is available online from Association for Computing Machinery at https://dx.doi.org/10.1145/2786451.2786486
If you are the owner of this record, you can report an update to it here: Report update to this record