Integration guide
Working with resumable uploads
When using the Batch compliance endpoints, developers can batch upload large amounts of X data and understand what action is needed to ensure that their datasets reflect user intent and the current state of the content on X. Uploading large amounts of data to a remote server is a relatively straightforward operation when systems and connectivity are stable and reliable. However, this may not always be the case. Some environments may impose a connection timeout, effectively cutting the connection between your app and the upload server after a set amount of time; you may also encounter connection issues, for example when trying to upload a large file from your laptop over a wi-fi connection. In these circumstances, it’s desirable to upload smaller portions of that file at a time, rather than having one single continuous connection.
X’s batch compliance endpoints rely on Google Cloud Storage to process large files. This type of storage is optimized for various applications; Cloud Storage supports a technique to manage large files called resumable uploads.
If the upload goes wrong at any point, Google Cloud Storage is able to resume the operation from where it was left off.
Creating a resumable job
Step one:
First, you will have to create a compliance job and specify whether you will be uploading Post IDs or user IDs (using the type parameter). Additionally, add resumable to the body and set it to true. Make sure to replace the $APP_ACCESS_TOKEN below with your App only Access Token below.
If your API call is successful, you will get a response similar to the following:
Take note of the value from the upload_url, you will need it in the following steps.
Step two:
Next, you will need to initiate the resumable upload. In order to do so, make a POST call to the upload_url from the previous step and make sure to include the following headers:
Content-Type: text/plain
Content-Length: 0
x-goog-resumable: start
If this call is successful, you will get a 201 response code. Then, in the response header, copy the value for the location header which will look something like this:
You can then upload your Post or User IDs to this location by following step two onwards, from the quick start guide.
Because of their technical complexity, resumable uploads are best used in conjunction with code. This guide will use Node.js with the needle request library.
Install the dependencies
Before proceeding, you should have a Node.js environment installed; you can obtain Node.js from its website. Once installed, Node.js will contain a utility called npm; make sure both Node and npm are installed by calling the following command, and ensure it doesn’t result in an error.
$ npm -v 6.4.1
A version number similar to this signifies your environment is ready (note that your version number may differ). We will use npm to install the upload library. Run this command:
$ npm install -g needle
You’re all set; there is no additional configuration required.
Request a resumable destination
When creating a new job, set the resumable parameter to true so you can get a destination that supports a resumable upload. In the response payload, you will receive an upload_url value.
Prepare the code to upload a file
By default, the library will create a new upload destination by accepting an upload location (called bucket) and the name of the file you wish to upload. Because the batch compliance endpoints create their own destination, we will need to tell the library we already have a location ready to accept our upload.
We will need to pass this value to the upload library, along with the name of the file containing the data to upload. Create a file, and name it twitter-upload.js. Add the following code:
Save the file wherever it makes the most sense. Next, in your command line, invoke the script and pass two parameters:
- The first will be the location of the file (with the Post or User IDs) that you wish to upload.
- The second will be the upload URL we received from the compliance endpoint response.
Ensure the URL is surrounded in double-quotes, and do the same for your file name if it contains spaces or other characters:
You will see output similar to this:
Starting upload to: https://storage.googleapis.com/twttr-tweet-compliance/<redacted> Upload not completed, resuming Initiating upload
You can pause the upload at any time by pressing Ctrl + C or closing your command line. You will be able to resume the upload from where you left off when you invoke the same command at a later stage. Once the file has been completely uploaded, you will see the following message:
Upload complete
At this point, you will be able to use the compliance status endpoint to check on the status of your compliance job, and you will be able to download the compliance result when complete.