Major national and global events are often accompanied by dramatic spikes in user activity across social media platforms. Sometimes these events are known about in advance, like the Super Bowl, political elections, and New Year’s celebrations around the world. Other times, the spikes in volume are due to unexpected happenings such as natural disasters, unplanned political events, pop culture moments, or health pandemics like COVID-19.These bursts of user activity may sometimes be short-lived (measured in seconds), or they may even be sustained over several minutes’ time. No matter their origin, it is important to consider the impact that they can have on applications consuming data from X. Here are some best practices that will help your team prepare for high-volume social data events.
With streams, staying connected is essential to avoid missing data. Your client application should be able to detect a disconnect and have logic to immediately retry its connection, using an exponential backoff if the reconnect attempt fails.
Building a multi-threaded application is a key strategy for handling high-volume streams. At a high-level, a best practice for managing data streams is to have a separate thread/process that establishes the streaming connection and then writes received JSON activities to a memory structure or a buffered stream reader. This ‘light-weight’ stream processing thread is responsible for handling incoming data, which can be buffered in memory, growing and shrinking as needed. Then a different thread consumes that hash and does the ‘heavy lifting’ of parsing the JSON, preparing database writes, or whatever else your application needs to do.
The events may occur after business hours or over the weekend, so be sure that your team is prepared for spikes to occur outside your normal business hours.