There is a substantial amount of data generated on the internet every second — posts, comments, photos, and videos. These different data types mean that there is a lot of ground to cover, so let’s focus on one — text.
All social conversations are based on written words — tweets, Facebook posts, comments, online reviews, and so on. Being a social media marketer, a Facebook group/profile moderator, or trying to promote your business on social media requires you to know how your audience reacts to the content you are uploading. One way is to read it all, mark hateful comments, divide them into similar topic groups, calculate statistics and… lose a big chunk of your time just to see that there are thousands of new comments to add to your calculations. Fortunately, there is another solution to this problem — machine learning. From this text you will learn:
- Why do you need specialised tools for social media analyses?
- What can you get from topic modeling and how it is done?
- How to automatically look for hate speech in comments?
Why are social media texts unique?
Before jumping to the analyses, it is really important to understand why social media texts are so unique:
- Posts and comments are short. They mostly contain one simple sentence or even single word or expression. This gives us a limited amount of information to obtain just from one post.
- Emojis and smiley faces — used almost exclusively on social media. They give additional details about the author’s emotions and context.
- Slang phrases which make posts resemble spoken language rather than written. It makes statements appear more casual.
These features make social media a whole different source of information and demand special attention while running an analysis using machine learning. In contrast, most open-source machine learning solutions are based on long, formal text, like Wikipedia articles and other website posts. As a result, these models perform badly on social media data, because they don’t understand additional forms of expression included. This problem is called domain shift and is a typical NLP problem. Different data also require customised data preparation methods called preprocessing. The step consists of cleaning text from invaluable tokens like URLs or mentions and conversion to machine readable format (more about how we do it in Sotrender). This is why it is crucial to use tools created especially for your data source to get the best results.
Topic Modeling for social media
Machine learning for text analysis (Natural Language Processing) is a vast field with lots of different model types that can gain insight into your data. One of the areas that can answer the question “what are the topics of given pieces of texts?” is topic modeling. These models help with understanding what people are talking about in general. It does not require any specially prepared data set with predefined topics. It can find topics which are patterns hidden within the data on its own without supervision and help — which makes it an unsupervised machine learning method. This means that it is easy to build a model for each individual problem.
There are lots of different algorithms that can be used for this task, but the most common and widely used is LDA (Latent Dirichlet Allocation). It’s based on word frequencies and topics distribution in texts. To put it simply, this method counts words in a given data set and groups them based on their co-occurrence into topics. Then the percentage distribution of topics in each document is calculated. As a result this method assumes that each text is a mixture of topics which works great with long documents where every paragraph relates to a different matter.
That’s why social media texts need a different procedure. One of the new algorithms is GSDMM (Gibbs sampling algorithm for a Dirichlet Mixture Model). What makes this one so different?:
- It is fast,
- designed for short texts,
- easily explained with an analogy of a teacher (algorithm) that wants to divide students (texts) into groups (topics) of similar interests.
Students are told to write down some movie titles they liked within 2 minutes. Most students are able to list 3–5 movies with this time frame (it corresponds to a limited number of words for social media texts). Then they are randomly assigned to a group. The last step is for every student to pick a different table with two rules in mind:
- pick a group with more students — favours bigger groups
- or a group with the most similar movie titles — makes groups more cohesive.
This last step is repeated multiple times. First rule that favours bigger groups is crucial to ensure that groups are not excessively fragmented. Due to the limited number of movie titles (words) for each student (text), each group (topic) is bound to have members with different movies in their lists but from the same genre.
As A result of the GSDMM algorithm you obtain an assignment of each text to one topic, as well as a list of the most important words for every topic.
The tricky part is to decide upon number of topic (problem of every unsupervised method) but when you finally do this you can gain quite of a lot of insights from the data:
- Distribution of topics in your data
- Word Clouds — allows us to comprehend the topic and name it. It is a quick and easy solution that can replace reading the whole set of text and spare you hours of tedious work of dividing it into sets.
- Time series analysis of topics — As we can see in the plot below some topics can gain more attention like number 7 and some of them fade away like number 4. Trying to grasp the idea of what is popular or can be popular in the future is a good thing to look back and see how topics were changing in the past.
In one of our recent projects for Collegium Civitas we analyzed 50 000 social media posts and comments and performed topic analysis on them. It allowed our client to answer questions like:
1) What was discussed in the time span of 2 months in social media?
In the dataset we were able to distinguish 10 different topics, revolving around Covid-19. Discussions covered statistics and covid-19 etiology, everyday life, government response to pandemic, consequences of limitations in traveling, trade market and supplies, everyday life, health care during pandemic, church and politics, common knowledge and conspiracy theories of Covid-19, politics and economy, spam messages and ads.
2) How were the discussions influenced by the pandemic situation?
During the pandemic burst the biggest theme was the origin and statistics of Covid-19. People talked about how the situation is changing and exchanged information about ways of disease spreading . To read more visit Collegium Civitas’ site (Polish version only).
Hate speech recognition
Another question that can be answered with machine learning is “what kind of emotion do people express in their comments or posts?” or “is my content generating hateful comments?”. There are only a few solutions for these tasks in the Polish language. That is why we build models based on social media text for Sentiment and Hate Speech recognition at Sotrender. Our solutions were built in two steps.
The first step is to convert text and emojis into numerical vector representation (embeddings) to be used later in neural networks. The main goal of this step is to achieve some kind of language model (LM) that has the knowledge of a human language so that vectors representing similar words are close to each other (for example: queen and king or paragraph and article) which implies that these words have similar meaning (semantic similarity). The property is shown on the graph below.
Training this model is similar to teaching a child how to speak by talking to them. Children by listening to their parents talk are able to grasp the meaning of words and the more they hear the more they understand.
According to this analogy, we have to use a huge set of social media text to train our model to understand its language. That is why we used a set of 100 millions posts and comments to train our model so it can properly assign vectors to words as well as to emojis. Tokens vectorised with an embeddings model provide the input to the neural network.
The second step is designing neural networks for a specific task — Hate speech recognition. The most important thing is the data set — the model needs examples of hate speech and non-hateful texts to learn how to tell them apart. In order to get best results you need to experiment with different architectures and model’s hyperparameters.
As a result of the hate speech recognition model, we get another grouping of our data set. Now we can see how our audience reacts, how many hateful comments or posts it’s creating. What’s more, by combining it again with the time of publication of each comment, we can see if there was a specific time period when the most hateful comments were generated like shown in a histogram below.
Combining this distribution with recent posts or events can give you insight into the type of content that provokes people. Also changes of hate speech contribution in time can be related with changes in topic distribution. Combining all the information from analysis can provide an in-depth image of the dataset.
As the histogram above shows most hate is connected to topic 3, 6 and 7. Knowing what makes people angry gives the opportunity to avoid sensitive topics in the future.
Same goes for sentiment analysis. We can produce similar visualizations for positive, negative or neutral comments and see their distribution in time or topics. If you would like to read thewhole report build based on our analysis of the 8 weeks of data you can find it here (only Polish version).
In Sotrender we have models for hate speech and sentiment recognition that are constantly improved and updated for social media texts. What’s more we have experience in building topic modeling models for individual cases. As you can see there’s a lot of benefits coming from this type of analysis:
- Getting to know your audience
- Having in depth look into topics of comments
- Discovering trending themes
- Finding source of hatred or negativity in our content
To name just a few!
 Yin, Jianhua and Jianyong Wang, A dirichlet multinomial mixture model-based approach for short text clustering, (2014), KDD ’14.
Original post: https://towardsdatascience.com/social-media-and-topic-modeling-how-to-analyze-posts-in-practice-d84fc0c613cb
108 comentários em “Social media and topic modeling: how to analyze posts in practice”
These are truly wonderful ideas in on the topic of blogging.
You have touched some pleasant factors here. Any way keep up wrinting.
Appreciating the commitment you put into your website and
in depth information you provide. It’s awesome to come across a blog every
once in a while that isn’t the same out of date rehashed material.
Great read! I’ve bookmarked your site and I’m adding your RSS feeds to my
I think the admin of this website is in fact working hard in support of
his site, since here every data is quality based information.
Its such as you read my thoughts! You appear to understand
a lot approximately this, such as you wrote the guide in it
or something. I feel that you could do with a few % to force
the message house a bit, however instead of that, this
is magnificent blog. An excellent read. I will definitely be back.
I absolutely love your blog.. Pleasant colors & theme.
Did you create this amazing site yourself? Please reply back as I’m looking to create my own website and want to learn where you got this from or exactly what the theme
is named. Thanks!
This paragraph will assist the internet visitors for creating new blog or even a
weblog from start to end.
I blog often and I really thank you for your content. This article has truly peaked my
interest. I am going to take a note of your website and keep checking for new information about once a week.
I opted in for your Feed too.
Hmm it looks like your site ate my first comment (it
was super long) so I guess I’ll just sum it up what I submitted and say, I’m thoroughly enjoying your blog.
I as well am an aspiring blog blogger but I’m still new to the whole thing.
Do you have any tips for inexperienced blog writers? I’d definitely appreciate it.
It’s nearly impossible to find experienced people in this particular subject, but you
sound like you know what you’re talking about! Thanks
I got this site from my pal who told me regarding this web page and at the moment this time I am visiting this
website and reading very informative articles here.
It’s appropriate time to make some plans for the future and
it’s time to be happy. I’ve read this post and if I could I desire to suggest you some interesting things or advice.
Maybe you could write next articles referring to this article.
I want to read even more things about it!
Howdy! I could have sworn I’ve been to this blog before
but after checking through some of the post I realized it’s new to me.
Anyhow, I’m definitely glad I found it and I’ll be bookmarking and checking back often!
This is a great tip especially to those fresh to the blogosphere.
Short but very accurate information… Appreciate your sharing this one.
A must read article!
Heya i am for the primary time here. I came across this board and
I to find It really helpful & it helped me out much. I
am hoping to present something again and aid others such as you helped
Very good post! We will be linking to this particularly great content on our site.
Keep up the good writing.
I just like the valuable info you supply on your articles.
I’ll bookmark your blog and test again right here frequently.
I’m fairly sure I’ll learn a lot of new stuff
proper right here! Best of luck for the next!
Excellent pieces. Keep posting such kind of information on your page.
Im really impressed by your blog.
Hello there, You’ve done a great job. I will definitely digg it and in my opinion suggest to my friends.
I am confident they’ll be benefited from this website.
Yesterday, while I was at work, my sister stole my
apple ipad and tested to see if it can survive a twenty five foot drop, just so
she can be a youtube sensation. My apple ipad is
now broken and she has 83 views. I know this is completely off topic but I had to share it with someone!
Heya i’m for the first time here. I found this board and I find
It truly useful & it helped me out much. I hope to give
something back and help others like you aided me.
Aw, this was a really good post. Taking the time and actual effort
to make a very good article… but what can I say… I put things off a lot and don’t seem to get
If some one desires expert view concerning blogging then i recommend him/her to pay a visit this
blog, Keep up the good job.
Whoa! This blog looks just like my old one! It’s on a completely different topic but it has pretty much the same layout
and design. Great choice of colors!
I go to see daily some blogs and blogs to read content, but this web site provides feature based content.
Hello, Neat post. There is a problem along with your site in web explorer, could test this?
IE nonetheless is the market chief and a large component of other folks will
miss your magnificent writing due to this problem.
Howdy, i read your blog from time to time and i own a similar one and i was just wondering if you
get a lot of spam remarks? If so how do you prevent it, any
plugin or anything you can advise? I get so much lately it’s driving me mad so any help is very much appreciated.
Hello to every one, it’s actually a good for me to go to see this
site, it consists of useful Information.
I have read so many articles or reviews about
the blogger lovers except this article is really a nice article, keep it up.
It’s very straightforward to find out any matter on net as compared
to textbooks, as I found this post at this
Hi, I think your website might be having browser compatibility issues.
When I look at your blog site in Safari, it looks fine but when opening in Internet Explorer, it has some
overlapping. I just wanted to give you a quick heads up!
Other then that, awesome blog!
Hey There. I found your weblog the use of msn. This is a very neatly written article.
I’ll be sure to bookmark it and return to read
extra of your helpful info. Thanks for the post.
I’ll certainly comeback.
Thank you a bunch for sharing this with all of us you
really recognise what you’re speaking approximately!
Bookmarked. Please also talk over with my web site =).
We will have a link change agreement between us
Usually I do not read post on blogs, however I wish to say that this write-up very pressured
me to take a look at and do so! Your writing taste has
been surprised me. Thank you, very great article.
Greate pieces. Keep writing such kind of info on your
blog. Im really impressed by it.
Hey there, You have performed an incredible job.
I will certainly digg it and in my view recommend to
my friends. I am sure they’ll be benefited from this web site.
Wonderful, what a web site it is! This web site provides valuable facts to us, keep it up.
This is a great tip especially to those new to the blogosphere.
Brief but very accurate info… Thanks for sharing this one.
A must read post!
I simply couldn’t depart your site prior to suggesting that I really loved the standard information an individual
provide on your visitors? Is going to be back regularly in order to check out new posts
Good blog post. I certainly appreciate this website. Continue the good work!
Yes! Finally something about newest movies out; newest movies out online; watch newest
movies out online; watch newest movies out;
where to watch newest movies; where to watch newest movies online; where
to watch latest movies; where to watch latest movies online free;
where to watch latest movies online; where to
watch latest putlocker movies online; where to watch latest putlocker movies; where to watch putlocker movies;
where to watch free putlocker movies; how to watch free putlocker movies online free; how to watch
free putlocker movies; newest movie trailers; cool movie trailers;
good movie trailers; download movie trailers; where to download
movie trailers; where to download movies free;
watch movies no ads; watch movies ad free; watch
movies online ad free; watch movies ad free;.
My relatives always say that I am killing my time here at net, except
I know I am getting know-how all the time by reading
thes nice articles or reviews.
Hi there, its fastidious post concerning media print, we all understand media is a impressive source of
What’s up, every time i used to check weblog posts here in the early hours in the daylight, because i like to learn more and more.
This piece of writing is really a good one it
helps new internet people, who are wishing for blogging.
I every time spent my half an hour to read this webpage’s posts
daily along with a mug of coffee.
It is not my first time to pay a visit this web site, i am browsing this web
site dailly and take good data from here everyday.
Attractive part of content. I simply stumbled upon your weblog and in accession capital to claim that I get in fact loved
account your blog posts. Anyway I will be subscribing in your augment or even I achievement
you get admission to persistently rapidly.
Hello there! This is my first visit to your blog! We are
a group of volunteers and starting a new initiative in a community in the same niche.
Your blog provided us useful information to work on.
You have done a wonderful job!
This is a really good tip particularly to those new to
the blogosphere. Brief but very precise information…
Many thanks for sharing this one. A must read article!
Your means of describing all in this post is
actually fastidious, all can without difficulty understand it, Thanks a lot.
Hi there! I could have sworn I’ve been to this
blog before but after browsing through some of the post I realized it’s new to me.
Nonetheless, I’m definitely happy I found it and I’ll be book-marking and checking back frequently!
Hello I am so thrilled I found your blog, I really found you by
error, while I was searching on Askjeeve for something else,
Anyways I am here now and would just like to say cheers for a tremendous post and a all round enjoyable blog (I also love the
theme/design), I don’t have time to look over it all at the
minute but I have book-marked it and also added in your RSS feeds,
so when I have time I will be back to read more, Please do keep
up the superb work.
What i do not understood is actually how you’re now not really much more
well-appreciated than you may be right now. You’re very intelligent.
You already know therefore significantly when it comes to this topic, made me personally
believe it from so many varied angles. Its
like men and women aren’t fascinated unless it’s one thing to accomplish with Lady gaga!
Your own stuffs outstanding. Always take care of it up!
But wanna remark that you have a very decent web site,
I like the design and style it really stands out.
Nice post. I learn something new and challenging on websites I stumbleupon on a daily basis.
It’s always exciting to read through content from other writers and practice a little something from
I the efforts you have put in this, regards for all the
I like looking at and I believe this website got some really useful stuff on it!
Hey there! I know this is somewhat off topic but I was wondering if you knew where
I could locate a captcha plugin for my comment form?
I’m using the same blog platform as yours and I’m having trouble finding one?
Thanks a lot!
Good day! I know this is kind of off topic but I was wondering if you knew
where I could locate a captcha plugin for my comment form?
I’m using the same blog platform as yours and I’m having problems finding
one? Thanks a lot!
I visit everyday a few sites and websites to read articles,
however this weblog offers feature based content.
Hi there to all, how is the whole thing, I think every one is getting
more from this web page, and your views are nice designed for new viewers.
Hey, you used to write great, but the last several posts have been kinda boring?
I miss your super writings. Past several posts are just a little out
of track! come on!
Pretty component to content. I just stumbled upon your web site and in accession capital to claim that I get in fact loved account your
blog posts. Any way I’ll be subscribing in your
feeds or even I fulfillment you get entry to
Pretty portion of content. I just stumbled upon your weblog and in accession capital to claim that
I acquire in fact loved account your blog posts. Any way I’ll
be subscribing for your feeds and even I fulfillment you get
entry to consistently quickly.
Pretty! This has been a really wonderful post.
Thanks for providing these details.
Your style is unique in comparison to other folks I have read stuff from.
Thank you for posting when you’ve got the opportunity, Guess
I’ll just bookmark this page.
bookmarked!!, I like your blog!
I visited many web sites but the audio feature for audio songs current at this web site is really fabulous.
My family always say that I am wasting my time here at web,
except I know I am getting experience all the
time by reading thes nice articles.
Hmm is anyone else experiencing problems with the images
on this blog loading? I’m trying to figure out if its a problem on my end or if it’s the blog.
Any feedback would be greatly appreciated.바카라
Hello! I just wanted to ask if you ever have any trouble with hackers?
My last blog (wordpress) was hacked and I ended
up losing several weeks of hard work due to no back
up. Do you have any methods to prevent hackers?
Hi friends, good post and fastidious arguments commented at this place, I am genuinely enjoying by
Thanks a bunch for sharing this with all folks you really recognize what you are speaking approximately!
Bookmarked. Kindly also visit my web site =). We can have a link trade agreement between us
This paragraph provides clear idea in favor of the new users of blogging,
that genuinely how to do blogging.
For hottest news you have to pay a quick visit internet and on web I found
this website as a finest site for most recent
Howdy! This article could not be written any better! Looking at this post reminds me of
my previous roommate! He continually kept preaching
about this. I’ll send this information to him.
Pretty sure he’ll have a very good read. I appreciate you for
Do you mind if I quote a few of your articles as long as I provide
credit and sources back to your webpage? My blog site is in the exact same area of interest as yours and my
users would truly benefit from a lot of the information you
present here. Please let me know if this ok with you.
Hello my family member! I wish to say that this post is amazing, great written and come with approximately all
vital infos. I’d like to see more posts
like this .
Fantastic site. Lots of useful information here.
I’m sending it to a few pals ans also sharing in delicious.
And obviously, thank you to your effort!
Hmm is anyone else experiencing problems with the images on this blog loading?
I’m trying to determine if its a problem on my end
or if it’s the blog. Any feed-back would be greatly appreciated.
We are a group of volunteers and opening a brand
new scheme in our community. Your website provided us with
valuable info to work on. You have done an impressive task and our whole community can be thankful to
What’s up i am kavin, its my first occasion to commenting anyplace, when i
read this paragraph i thought i could also create comment due to this sensible
Today, I went to the beach with my kids. I found a
sea shell and gave it to my 4 year old daughter and said “You can hear the ocean if you put this to your ear.” She put the shell to her ear and screamed.
There was a hermit crab inside and it pinched her ear.
She never wants to go back! LoL I know this is
completely off topic but I had to tell someone!
Thank you for the good writeup. It actually was a entertainment account it.
Glance advanced to more added agreeable from you!
By the way, how could we keep in touch?
Simply wanna input that you have a very nice web site, I the style and design it really stands out.
Highly energetic article, I liked that a lot. Will there be a part 2?
I enjoy the efforts you have put in this, regards for all the great blog posts.
You’re so interesting! I do not think I have read something like that before.
So great to discover somebody with unique thoughts on this
subject. Seriously.. many thanks for starting this up. This site is
something that is required on the internet, someone with a little originality!
Hello, I read your new stuff regularly. Your humoristic
style is awesome, keep up the good work!
I wish to show my appreciation for your kindness supporting individuals who have the
need for help on this particular niche. Your personal
dedication to passing the message all around came
to be surprisingly beneficial and have helped somebody much like
me to reach their objectives. This interesting help and advice implies a great
deal a person like me and further more to my colleagues.
Thank you; from all of us.
I have to get across my affection for your generosity giving support to those who absolutely need help with that situation. Your real dedication to passing the message along had
been certainly beneficial and has regularly made girls just like me to reach their aims.
Your own important tips and hints can mean a great deal to me and somewhat more to my mates.
Thank you; from everyone of us.
Having read this I believed it was really enlightening.
I appreciate you spending some time and effort to put this information together.
I once again find myself personally spending a significant amount of time both reading and leaving comments.
But so what, it was still worthwhile!
Nice blog here! Also your web site loads up very fast!
What host are you using? Can I get your affiliate link to your host?
I wish my website loaded up as quickly as yours lol
I simply want to tell you that I’m very new to weblog and absolutely savored you’re blog. Most likely I’m going to bookmark your blog post . You surely come with very good stories. Thanks for revealing your web site.
Highly energetic post, I liked that a lot. Will there be a part 2?
I love the efforts you have put in this, thank you for all the great posts.
I the efforts you have put in this, regards for all the great blog posts.
I pay a visit everyday some web pages and websites to read content, except this webpage provides quality based
I go to see day-to-day some blogs and information sites
to read articles or reviews, except this webpage gives quality based
Hello, i believe that i noticed you visited my web site so i got here to ?go back
the choose?.I’m trying to to find things to
improve my website!I suppose its ok to make
use of some of your ideas!!
What i do not realize is actually how you’re
no longer actually a lot more smartly-preferred than you might
be right now. You are very intelligent. You realize thus significantly in the case of this topic, produced
me in my opinion believe it from a lot of various angles.
Its like women and men are not fascinated unless it’s one
thing to do with Lady gaga! Your individual stuffs great.
All the time deal with it up!
I reckon something really special in this site.
My brother suggested Ӏ migһt like thіs web site.
He was totally rigһt. This post tгuly made mү day. You cann’t imagine just how much time I had spent for tһis info!
Your way of describing the whole thing in this paragraph is actually pleasant, all
be capable of simply know it, Thanks a lot.
I consider something genuinely special in this web site.
Im grateful for the article post. Will read on…
Your way of telling all in this post is in fact good, every
one can effortlessly understand it, Thanks a lot.
Thanks for sharing your thoughts about Karendoll. Regards
I am truly thankful to the holder of this site who has shared this great article at at