Data dictionary: Enterprise
Interested in learning more about how the enterprise data formats map to the X API v2 format?
Check out our comparison guides:
Twitter API: Enterprise data dictionary
Introduction
Enterprise
Posts are the basic atomic building block of all things X. All X APIs that return Posts provide that data encoded using JavaScript Object Notation (JSON). JSON is based on key-value pairs, with named attributes and associated values. Post objects retrieved from the API include a X User’s “status update” but Retweets, replies, and quote Tweets are all also Post objects. If a Post is related to another Post, as a Retweet, reply or quote Tweet, each will be identified or embedded into the Post object. Even the simplest Post in the native X data format, will have nested JSON objects to represent the other attributes of a Post, such as the author, mentioned users, tagged place location, hashtags, cashtag symbols, media or URL links. When working with X data, this is an important concept to understand. The format of the Post data you will receive from the X API depends on the type of Post received, the X API you are using, and the format settings.
Enterprise endpoints that return Post objects have been updated to provide the metadata needed to understand the Post’s edit history. Learn more about these metadata on the “Edit Posts” fundamentals page.
In native X format, the JSON payload will include of ‘root-level’ attributes, and nested JSON objects (which are represented here with the {}
notation):
Available data formats
Please note: It is highly recommended to use the Enriched Native format for enterprise data APIs.
The Enriched Native format includes all new metadata since 2017, such as poll metadata, and additional metrics such as reply_count and quote_count.
Activity Streams format has not been updated with new metadata or enrichments since the character update in 2017.
Enterprise data APIs deliver data in two different formats. The enterprise format closest to the standard v1.1 native format is Native Enriched. The legacy enterprise data format is Activity Streams, orignially implimented and used by Gnip as a normalized format across X and other social media data providers at the time. While this format is still available, X has only invested new features and developments on the native enriched format since 2017.
The enriched native format is exactly how it sounds, it includes native X objects as well as additional enrichments avialable for enterprise data products such as URL unwinding metadata, profile geo, poll metadata and additional engagement metrics.
- Expanded and enhanced URLs enrichment
- Matching rules enrichment
- Poll metadata enrichment
- Profile geo enrichment
Object comparison per data format
Whatever your X use case, understanding what these JSON-encoded Post objects and attributes represent is critical to successfully finding your data signals of interest. To help in that effort, there are a set of pages dedicated to each object in each data format_._
Reflecting the JSON hierarchy above, here are links to each of these objects:
Native Enriched | Activity Streams |
---|---|
Link Post object | Link Activity object |
Link User object | Link Actor object |
Link Entities object | Link X entities object |
Link Extended entities object | [Link]/x-api/enterprise-gnip-2.0/fundamentals/data-dictionary#x-extended-entities X extended entitites object |
Link Geo object | Link Location object |
n/a | Link Gnip object |
Parsing best-practices
- X JSON is encoded using UTF-8 characters.
- Parsers should tolerate variance in the ordering of fields with ease. It should be assumed that Post JSON is served as an unordered hash of data.
- Parsers should tolerate the addition of ‘new’ fields.
- JSON parsers must be tolerant of ‘missing’ fields, since not all fields appear in all contexts.
- It is generally safe to consider a nulled field, an empty set, and the absence of a field as the same thing
Enterprise Native Enriched data objects
Native Enriched Tweet object
Interested in learning more about how the Native Enriched data format maps to the X API v2 format?
Check out our comparison guide: Native Enriched compared to X API v2
Post Object
When using enterprise data products, you will notice that much of the data dictionary is similar to the native format of Post data, with some additional enriched metadata. The base level of the native enriched format uses much of the same object names as the X API v1.1 data format. The Post object has a long list of ‘root-level’ attributes, including fundamental attributes such as id
, created_at
, and text
. Post objects will also have nested objects to include the user
, entities
, and extended_entities
. Post objects will also have other nested Post objects such as retweeted_status, quoted_status and extended_tweet. The native enriched format will additionally have a matching_rules object.
X Data Dictionary
Below you will find the data dictionary for these ‘root-level’ attributes, as well as links to child object data dictionaries.
Attribute | Type | Description |
---|---|---|
created_at | String | UTC time when this Post was created. Example: “created_at”: “Wed Oct 10 20:19:24 +0000 2018” |
id | Int64 | The integer representation of the unique identifier for this Post. This number is greater than 53 bits and some programming languages may have difficulty/silent defects in interpreting it. Using a signed 64 bit integer for storing this identifier is safe. Use id_str to fetch the identifier to be safe. See X IDs for more information. Example:“id”:1050118621198921728 |
id_str | String | The string representation of the unique identifier for this Post. Implementations should use this rather than the large integer in id . Example:“id_str”:“1050118621198921728” |
text | String | The actual UTF-8 text of the status update. See X-text for details on what characters are currently considered valid. Example: “text”:“To make room for more expression, we will now count all emojis as equal—including those with gender and skin t… https://t.co/MkGjXf9aXm” |
source | String | Utility used to post the Post, as an HTML-formatted string. Posts from the X website have a source value of web .Example: “source”:“Twitter Web Client” |
truncated | Boolean | Indicates whether the value of the text parameter was truncated, for example, as a result of a retweet exceeding the original Post text length limit of 140 characters. Truncated text will end in ellipsis, like this ... Since X now rejects long Posts vs truncating them, the large majority of Posts will have this set to false . Note that while native retweets may have their toplevel text property shortened, the original text will be available under the retweeted_status object and the truncated parameter will be set to the value of the original status (in most cases, false ). Example:“truncated”:true |
in_reply_to_status_id | Int64 | Nullable. If the represented Post is a reply, this field will contain the integer representation of the original Post’s ID. Example: “in_reply_to_status_id”:1051222721923756032 |
in_reply_to_status_id_str | String | Nullable. If the represented Post is a reply, this field will contain the string representation of the original Post’s ID. Example: “in_reply_to_status_id_str”:“1051222721923756032” |
in_reply_to_user_id | Int64 | Nullable. If the represented Post is a reply, this field will contain the integer representation of the original Post’s author ID. This will not necessarily always be the user directly mentioned in the Post. Example: “in_reply_to_user_id”:6253282 |
in_reply_to_user_id_str | String | Nullable. If the represented Post is a reply, this field will contain the string representation of the original Post’s author ID. This will not necessarily always be the user directly mentioned in the Post. Example: “in_reply_to_user_id_str”:“6253282” |
in_reply_to_screen_name | String | Nullable. If the represented Post is a reply, this field will contain the screen name of the original Post’s author. Example: “in_reply_to_screen_name”:“twitterapi” |
user | User object | The user who posted this Post. See User data dictionary for complete list of attributes. Example highlighting select attributes: { “user”: <br/> “id”: 6253282, “id_str”: “6253282”, “name”: “Twitter API”, “screen_name”: “TwitterAPI”, “location”: “San Francisco, CA”, “url”: “https://developer.twitter.com”, “description”: “The Real Twitter API. Tweets about API changes, service issues and our Developer Platform. Don’t get an answer? It’s on my website.”, “verified”: true, “followers_count”: 6129794, “friends_count”: 12, “listed_count”: 12899, “favourites_count”: 31, “statuses_count”: 3658, “created_at”: “Wed May 23 06:01:13 +0000 2007”, “utc_offset”: null, “time_zone”: null, “geo_enabled”: false, “lang”: “en”, “contributors_enabled”: false, “is_translator”: false, “profile_background_color”: “null”, “profile_background_image_url”: “null”, “profile_background_image_url_https”: “null”, “profile_background_tile”: null, “profile_link_color”: “null”, “profile_sidebar_border_color”: “null”, “profile_sidebar_fill_color”: “null”, “profile_text_color”: “null”, “profile_use_background_image”: null, “profile_image_url”: “null”, “profile_image_url_https”: “https://pbs.twimg.com/profile\_images/942858479592554497/BbazLO9L_normal.jpg”, “profile_banner_url”: “https://pbs.twimg.com/profile_banners/6253282/1497491515”, “default_profile”: false, “default_profile_image”: false, “following”: null, “follow_request_sent”: null, “notifications”: null } } |
coordinates | Coordinates | Nullable. Represents the geographic location of this Post as reported by the user or client application. The inner coordinates array is formatted as geoJSON (longitude first, then latitude). Example: “coordinates”: <br/> “coordinates”: [ -75.14310264, 40.05701649 ], “type”:“Point” } |
place | Places | Nullable When present, indicates that the Post is associated (but not necessarily originating from) a Place Example: “place”: <br/> “attributes”:, “bounding_box”: <br/> “coordinates”: [[ [-77.119759,38.791645], [-76.909393,38.791645], [-76.909393,38.995548], [-77.119759,38.995548] ]], “type”:“Polygon” }, “country”:“United States”, “country_code”:“US”, “full_name”:“Washington, DC”, “id”:“01fbe706f872cb32”, “name”:“Washington”, “place_type”:“city”, “url”:“http://api.x.com/1/geo/id/0172cb32.json” } |
quoted_status_id | Int64 | This field only surfaces when the Post is a quote Tweet. This field contains the integer value Post ID of the quoted Tweet. Example: “quoted_status_id”:1050119905717055488 |
quoted_status_id_str | String | This field only surfaces when the Post is a quote Tweet. This is the string representation Post ID of the quoted Tweet. Example: “quoted_status_id_str”:“1050119905717055488” |
is_quote_status | Boolean | Indicates whether this is a Quoted Tweet. Example: “is_quote_status”:false |
quoted_status | Post | This field only surfaces when the Post is a quote Tweet. This attribute contains the Post object of the original Post that was quoted. |
retweeted_status | Post | Users can amplify the broadcast of Posts authored by other users by Retweeting . Retweets can be distinguished from typical Posts by the existence of a retweeted_status attribute. This attribute contains a representation of the original Post that was retweeted. Note that retweets of retweets do not show representations of the intermediary retweet, but only the original Post. (Users can also unretweet a retweet they created by deleting their retweet.) |
quote_count | Integer | Nullable. Indicates approximately how many times this Post has been quoted by X users. Example: “quote_count”:33 Note: This object is only available with the Premium and Enterprise tier products. |
reply_count | Int | Number of times this Post has been replied to. Example: “reply_count”:30 Note: This object is only available with the Premium and Enterprise tier products. |
retweet_count | Int | Number of times this Post has been retweeted. Example: “retweet_count”:160 |
favorite_count | Integer | Nullable. Indicates approximately how many times this Post has been liked by X users. Example: “favorite_count”:295 |
entities | Entities | Entities which have been parsed out of the text of the Post. Additionally see Entities in X Objects . Example: “entities”: <br/> “hashtags”:[], “urls”:[], “user_mentions”:[], “media”:[], “symbols”:[] “polls”:[] } |
extended_entities | Extended Entities | When between one and four native photos or one video or one animated GIF are in Post, contains an array ‘media’ metadata. This is also available in Quote Tweets. Additionally see Entities in X Objects . Example: “entities”: <br/> “media”:[] } |
favorited | Boolean | Nullable. Indicates whether this Post has been liked by the authenticating user. Example: “favorited”:true |
retweeted | Boolean | Indicates whether this Post has been Retweeted by the authenticating user. Example: “retweeted”:false |
possibly_sensitive | Boolean | Nullable. This field indicates content may be recognized as sensitive. The Post author can select within their own account preferences and choose “Mark media you post as having material that may be sensitive” so each Post created after has this flag set. This may also be judged and labeled by an internal X support agent. ”possibly_sensitive”:false |
filter_level | String | Indicates the maximum value of the filter_level parameter which may be used and still stream this Post. So a value of medium will be streamed on none , low , and medium streams.Example: “filter_level”: “low” |
lang | String | Nullable. When present, indicates a BCP 47 language identifier corresponding to the machine-detected language of the Post text, or und if no language could be detected. Example: “lang”: “en” |
edit_history | Object | Unique identifiers indicating all versions of a Post. For Posts with no edits, there will be one ID. For Posts with an edit history, there will be multiple IDs, arranged in ascending order reflecting the order of edits, with the most recent version in the last position of the array. The Post IDs can be used to hydrate and view previous versions of a Post. Example: edit_history”: <br/> “initial_tweet_id”: “1283764123” “edit_tweet_ids”: [“1283764123”, “1394263866”] } |
edit_controls | Object | When present, indicates how long a Post is still editable for and the number of remaining edits. Posts are only editable for the first 30 minutes after creation and can be edited up to five times. The Post IDs can be used to hydrate and view previous versions of a Post. Example: “edit_controls”: <br/> “editable_until_ms”: 123 “edits_remaining”: 3 } |
editable | Boolean | When present, indicates if a Post was eligible for edit when published. This field is not dynamic and won’t toggle from True to False when a Post reaches its editable time limit, or maximum number of edits. The following Post features will cause this field to be false: * Posts is promoted * Post has a poll * Post is a non-self-thread reply * Post is a retweet (note that Quote Tweets are eligible for edit) * Post is nullcast * Community Post * Superfollow Post * Collaborative Post |
matching_rules | Array of Rule Objects | Present in filtered products such as X Search and PowerTrack. Provides the id and tag associated with the rule that matched the Post. More on matching rules here. With PowerTrack, more than one rule can match a Post. Example: “matching_rules”: ” [<br/> “tag”: “twitterapi emojis”, “id”: 1050118621198921728, “id_str”: “1050118621198921728” }]“ |
Additional Post attributes
X APIs that provide Posts (e.g. the GET statuses/lookup endpoint) may include these additional Post attributes:
Attribute | Type | Description |
---|---|---|
current_user_retweet | Object | Perspectival Only surfaces on methods supporting the include_my_retweet parameter, when set to true. Details the Post ID of the user’s own retweet (if existent) of this Post. Example: “current_user_retweet”: <br/> “id”: 6253282, “id_str”: “6253282” } |
scopes | Object | A set of key-value pairs indicating the intended contextual delivery of the containing Post. Currently used by X’s Promoted Products. Example: “scopes”:{“followers”:false} |
withheld_copyright | Boolean | When present and set to “true”, it indicates that this piece of content has been withheld due to a DMCA complaint . Example: “withheld_copyright”: true |
withheld_in_countries | Array of String | When present, indicates a list of uppercase two-letter country codes this content is withheld from. X supports the following non-country values for this field: “XX” - Content is withheld in all countries “XY” - Content is withheld due to a DMCA request. Example: “withheld_in_countries”: [“GR”, “HK”, “MY”] |
withheld_scope | String | When present, indicates whether the content being withheld is the “status” or a “user.” Example: “withheld_scope”: “status” |
Deprecated Attributes
Field | Type | Description |
geo | Object | Deprecated. Nullable. Use the coordinates field instead. This deprecated attribute has its coordinates formatted as [lat, long], while all other Post geo is formatted as [long, lat]. |
Nested Post objects
In several cases, a Post object will included other nested objects. If you are working with nested objects, then that JSON payload will contain multiple Post objects, and each Post object may contain its own objects. The root-level object will contain information on the type of action taken, i.e. whether it is a Retweet or a Quote Tweet, and may also contain an object that describes the ‘original’ Post being shared. Extended Posts will include a nested extended object that extends beyond 140 characters, which was used to prevent breaking changes when the update was made in 2017. Each nested object dictionary is described below.
Retweets
Retweets always contain two Post objects. The ‘original’ Post being Retweeted is provided in a “retweeted_status” object. The root-level object encapsulates the Retweet itself, including a User object for the account taking the Retweet action and the time of the Retweet. Retweeting is an action to share a Post with your followers, and no other new content can be added. Also, a (new) location cannot be provided with a Retweet. While the ‘original’ Post may have geo-tagged, the Retweet “geo” and “place” objects will always be null.
Even before the introduction of Extended Posts, the root-level “entities” object was in some cases truncated and incomplete due to the “RT @username ” string being appended to Post message being Retweeted. Note that if a Retweet gets Retweeted, the “retweet_status” will still point to the original Post, meaning the intermediate Retweet is not included. Similar behavior is seen when using x.com to ‘display’ a Retweet. If you copy the unique Post ID assigned to the Retweet ‘action’, the original Post is displayed.
Below is an example structure for a Retweet. Again, when parsing Retweets, it is key to parse the “retweeted_status” object for complete (original) Post message and entity metadata.
Quote Tweets
Quote Tweets are much like Retweets except that they include a new Post message. These new messages can contain their own set of hashtags, links, and other “entities” metadata. Quote Tweets can also include location information shared by the user posting the Quote Tweet, along with media such as GIFs, videos, and photos.
Quote Tweets will contain at least two Post objects, and in some cases, three. The Post being Quoted, which itself can be a Quoted Tweet, is provided in a “quoted_status” object. The root-level object encapsulates the Quote Tweet itself, including a User object for the account taking the sharing action and the time of the Quote Tweet.
Note that Quote Tweets can now have photos, GIFs, or videos, added to them using the ‘Post’ user-interface. When links to externally hosted media are included in the Quote Tweet message, the root-level “entities.urls” will describe those. Media attached to Quote Tweets will appear in the root-level “extended_entities” metadata.
When Quote Tweets were first launched, a shortened link (t.co URL) was appended to the ‘original’ Post message and provided in the root-level “text” field. In addition, metadata for that t.co URL was included in the root-level ‘entities.urls’ array. In May 2018, we changed this so that the shortened t.co URL to the quoted Tweet will not be included in the root-level “text” field. Second, the metadata for the quoted Tweet will not be included in the “entities.urls” metadata. Instead, URL metadata for the quoted Tweet will be in a new “quoted_status_permalink” object on the root-level (or top-level), so at the same level of the “quoted_status” object.
Below is an example structure for a Quote Tweet using this original formatting.
Extended Posts
JSON that describes Extended Posts was introduced when 280-character Posts were launched in November 2017. Post JSON was extended to encapsulate these longer messages, while not breaking the thousands of apps parsing these fundamental X objects. To provide full backward compatibility, the original 140-character ‘text’ field, and the entity objects parsed from that, were retained. In the case of Posts longer than 140 characters, this root-level ‘text’ field would become truncated and thus incomplete. Since the root-level ‘entities’ objects contain arrays of key metadata parsed from the ‘text’ message, such as included hashtags and links, these collections would be incomplete. For example, if a Post message was 200 characters long, with a hashtag included at the end, the legacy root-level ‘entities.hashtags’ array would not include it.
A new ‘extended_tweet’ field was introduced to hold the longer Post messages and complete entity metadata. The “extended_tweet” object provides the “full_text” field that contains the complete, untruncated Post message when longer than 140 characters. The “extended_tweet” object also contains an “entities” object with complete arrays of hashtags, links, mentions, etc.
Extended Posts are identified with a root-level “truncated” boolean. When true (“truncated”: true), the “extended_tweet” fields should be parsed instead of the root-level fields.
Note in the JSON example below that the root-level “text” field is truncated and the root-level “entities.hashtags” array is empty even though the Post message includes three hashtags. Since this is an Extended Post, the “truncated” field is set to true, and the “extended_tweet” object provides complete “full_text” and “entities” Post metadata.
Native Enriched User object
The User object contains X User account metadata that describes the X User referenced.
User Data Dictionary
Attribute | Type | Description |
---|---|---|
id | Int64 | The integer representation of the unique identifier for this User. This number is greater than 53 bits and some programming languages may have difficulty/silent defects in interpreting it. Using a signed 64 bit integer for storing this identifier is safe. Use id_str to fetch the identifier to be safe. See X IDs for more information. Example:“id”: 6253282 |
id_str | String | The string representation of the unique identifier for this User. Implementations should use this rather than the large, possibly un-consumable integer in id . Example:“id_str”: “6253282” |
name | String | The name of the user, as they’ve defined it. Not necessarily a person’s name. Typically capped at 50 characters, but subject to change. Example: “name”: “API” |
screen_name | String | The screen name, handle, or alias that this user identifies themselves with. screen_names are unique but subject to change. Use id_str as a user identifier whenever possible. Typically a maximum of 15 characters long, but some historical accounts may exist with longer names. Example:“screen_name”: “api” |
location | String | Nullable . The user-defined location for this account’s profile. Not necessarily a location, nor machine-parseable. This field will occasionally be fuzzily interpreted by the Search service. Example: “location”: “San Francisco, CA” |
derived | Arrays of Enrichment Objects | Enterprise APIs only Collection of Enrichment metadata derived for user. Provides the Profile Geo Enrichment metadata. See referenced documentation for more information, including JSON data dictionaries. Example: “derived”:“locations”: [“country”:“United States”,“country_code”:“US”,“locality”:“Denver”] |
url | String | Nullable . A URL provided by the user in association with their profile. Example: “url”: “https://developer.twitter.com” |
description | String | Nullable . The user-defined UTF-8 string describing their account. Example: “description”: “The Real X API.” |
protected | Boolean | When true, indicates that this user has chosen to protect their Posts. See About Public and Protected Posts . Example: “protected”: true |
verified | Boolean | When true, indicates that the user has a verified account. See Verified Accounts . Example: “verified”: false |
followers_count | Int | The number of followers this account currently has. Under certain conditions of duress, this field will temporarily indicate “0”. Example: “followers_count”: 21 |
friends_count | Int | The number of users this account is following (AKA their “followings”). Under certain conditions of duress, this field will temporarily indicate “0”. Example: “friends_count”: 32 |
listed_count | Int | The number of public lists that this user is a member of. Example: “listed_count”: 9274 |
favourites_count | Int | The number of Posts this user has liked in the account’s lifetime. British spelling used in the field name for historical reasons. Example: “favourites_count”: 13 |
statuses_count | Int | The number of Posts (including retweets) issued by the user. Example: “statuses_count”: 42 |
created_at | String | The UTC datetime that the user account was created on X. Example: “created_at”: “Mon Nov 29 21:18:15 +0000 2010” |
profile_banner_url | String | The HTTPS-based URL pointing to the standard web representation of the user’s uploaded profile banner. By adding a final path element of the URL, it is possible to obtain different image sizes optimized for specific displays. For size variants, please see User Profile Images and Banners . Example: “profile_banner_url”: “https://si0.twimg.com/profile_banners/819797/1348102824” |
profile_image_url_https | String | A HTTPS-based URL pointing to the user’s profile image. Example: “profile_image_url_https”: “https://abs.twimg.com/sticky/default\_profile\_images/default\_profile\_normal.png” |
default_profile | Boolean | When true, indicates that the user has not altered the theme or background of their user profile. Example: “default_profile”: false |
default_profile_image | Boolean | When true, indicates that the user has not uploaded their own profile image and a default image is used instead. Example: “default_profile_image”: false |
No longer supported (deprecated) attributes
Field | Type | Description |
---|---|---|
utc_offset | null | Value will be set to null. Still available via GET account/settings |
time_zone | null | Value will be set to null. Still available via GET account/settings as tzinfo_name |
lang | null | Value will be set to null. Still available via GET account/settings as language |
geo_enabled | null | Value will be set to null. Still available via GET account/settings. This field must be true for the current user to attach geographic data when using POST statuses / update |
following | null | Value will be set to null. Still available via GET friendships/lookup |
follow_request_sent | null | Value will be set to null. Still available via GET friendships/lookup |
has_extended_profile | null | Deprecated. Value will be set to null. |
notifications | null | Deprecated. Value will be set to null. |
profile_location | null | Deprecated. Value will be set to null. |
contributors_enabled | null | Deprecated. Value will be set to null. |
profile_image_url | null | Deprecated. Value will be set to null. NOTE: Profile images are only available using the profile_image_url_https field. |
profile_background_color | null | Deprecated. Value will be set to null. |
profile_background_image_url | null | Deprecated. Value will be set to null. |
profile_background_image_url_https | null | Deprecated. Value will be set to null. |
profile_background_tile | null | Deprecated. Value will be set to null. |
profile_link_color | null | Deprecated. Value will be set to null. |
profile_sidebar_border_color | null | Deprecated. Value will be set to null. |
profile_sidebar_fill_color | null | Deprecated. Value will be set to null. |
profile_text_color | null | Deprecated. Value will be set to null. |
profile_use_background_image | null | Deprecated. Value will be set to null. |
is_translator | null | Deprecated. Value will be set to null. |
is_translation_enabled | null | Deprecated. Value will be set to null. |
translator_type | null | Deprecated. Value will be set to null. |
Example user object:
Native Enriched Geo Objects
Posts can be associated with a location, generating a Post that has been ‘geo-tagged.’ Post locations can be assigned by using the X user-interface or when posting a Post using the API. Post locations can be an exact ‘point’ location, or a X Place with a ‘bounding box’ that describes a larger area ranging from a venue to an entire region.
There are three ‘root-level’ JSON objects used to describe the location associated with a Post: place, geo and coordinates.
Additionally, the native enriched format includes the profile geo enrichment’s derived location within the user object.
The place
object is always present when a Post is geo-tagged with a place,. Places are specific, named locations with corresponding geo coordinates. When users decide to assign a location to their Post, they are presented with a list of candidate X Places. When using the API to post, a X Place can be attached by specifying a place_id when posting. Posts associated with Places are not necessarily issued from that location but could also potentially be about that location.
The geo and coordinates
objects only present (non-null) when the Post is assigned an exact location. If an exact location is provided, the coordinates
object will provide a [long, lat] array with the geographical coordinates, and a X Place that corresponds to that location will be assigned.
Place data dictionary
Field | Type | Description |
---|---|---|
id | String | ID representing this place. Note that this is represented as a string, not an integer. Example: “id”:“01a9a39529b27f36” |
url | String | URL representing the location of additional place metadata for this place. Example: “url”:“https://api.x.com/1.1/geo/id/01a9a39529b27f36.json” |
place_type | String | The type of location represented by this place. Example: “place_type”:“city” |
name | String | Short human-readable representation of the place’s name. Example: “name”:“Manhattan” |
full_name | String | Full human-readable representation of the place’s name. Example: “full_name”:“Manhattan, NY” |
country_code | String | Shortened country code representing the country containing this place. Example: “country_code”:“US” |
country | String | Name of the country containing this place. Example: “country”:“United States” |
bounding_box | Object | A bounding box of coordinates which encloses this place. Example: “bounding_box”: “coordinates”: [ [ [ -74.026675, 40.683935 ], [ -74.026675, 40.877483 ], [ -73.910408, 40.877483 ], [ -73.910408, 40.3935 ] ] ], “type”: “Polygon” |
attributes | Object | When using PowerTrack, 30-Day and Full-Archive Search APIs, and Volume Streams this hash is null. Example: “attributes”: |
Bounding box
Field | Type | Description |
coordinates | Array of Array of Array of Float | A series of longitude and latitude points, defining a box which will contain the Place entity this bounding box is related to. Each point is an array in the form of [longitude, latitude]. Points are grouped into an array per bounding box. Bounding box arrays are wrapped in one additional array to be compatible with the polygon notation. Example: “coordinates”: [ [ [ -74.026675, 40.683935 ], [ -74.026675, 40.877483 ], [ -73.910408, 40.877483 ], [ -73.910408, 40.3935 ] ] ] |
type | String | The type of data encoded in the coordinates property. This will be “Polygon” for bounding boxes and “Point” for Posts with exact coordinates. Example: “type”:“Polygon” |
Geo object data dictionary
Field | Type | Description |
coordinates | Collection of Float | The longitude and latitude of the Post’s location, as a collection in the form [latitude, longitude]. Example: ** “geo”: “type”:** “Point”, ** “coordinates”: [ 54.27784, -0.41068 ] ** |
type | String | The type of data encoded in the coordinates property. This will be “Point” for Post coordinates fields. Example: “type”: “Point” |
Coordinates object data dictionary
Field | Type | Description |
coordinates | Collection of Float | The longitude and latitude of the Post’s location, as a collection in the form [longitude, latitude]. Example: ** “coordinates”: “type”:** “Point”, ** “coordinates”: [ -0.41068, 54.27784 ] ** |
type | String | The type of data encoded in the coordinates property. This will be “Point” for Post coordinates fields. Example: “type”: “Point” |
Derived locations
Field | Type | Description |
derived | locations object | Derived location from the profile geo enrichement “derived”: “locations”: [ ** “country”:** “United Kingdom”, “country_code”: “GB”, “locality”: “Yorkshire”, “region”: “England”, “full_name”: “Yorkshire, England, United Kingdom”, ** “geo”: “coordinates”: [ -1.5, 54 ], “type”:** “point” ** ] ** |
Examples:
Data dictionary: Enterprise
X entities
Jump to on this page
Retweet and Quote Tweet details
Introduction
Entities provide metadata and additional contextual information about content posted on X. The entities
section provides arrays of common things included in Posts: hashtags, user mentions, links, stock tickers (symbols), X polls, and attached media. These arrays are convenient for developers when ingesting Posts, since X has essentially pre-processed, or pre-parsed, the text body. Instead of needing to explicitly search and find these entities in the Post body, your parser can go straight to this JSON section and there they are.
Beyond providing parsing conveniences, the entities
section also provides useful ‘value-add’ metadata. For example, if you are using the Enhanced URLs enrichment, URL metadata include fully-expanded URLs, as well as associated website titles and descriptions. Another example is when there are user mentions, the entities metadata include the numeric user ID, which are useful when making requests to many X APIs.
Every Post JSON payload includes an entities
section, with the minimum set of hashtags
, urls
, user_mentions
, and symbols
attributes, even if none of those entities are part of the Post message. For example, if you examine the JSON for a Post with a body of “Hello World!” and no attached media, the Post’s JSON will include the following content with entity arrays containing zero items:
Notes:
- media and polls entities will only appear when that type of content is part of the Post.
- if you are working with native media (photos, videos, or GIFs), the Extended Entities object is the way to go.
Entities object
The entities
and extended_entities
sections are both made up of arrays of entity objects. Below you will find descriptions for each of these entity objects, including data dictionaries that describe the object attribute names, types, and short description. We’ll also indicate which PowerTrack Operators match these attributes, and include some sample JSON payloads.
A collection of common entities found in Posts, including hashtags, links, and user mentions. This entities
object does include a media
attribute, but its implementation in the entiites
section is only completely accurate for Posts with a single photo. For all Posts with more than one photo, a video, or animated GIF, the reader is directed to the extended_entities
section.
Entities data dictionary
The entities object is a holder of arrays of other entity sub-objects. After illustrating the entities
structure, data dictionaries for these sub-objects, and the Operators that match them, will be provided.
Field | Type | Description |
---|---|---|
hashtags | Array of Hashtag Objects | Represents hashtags which have been parsed out of the Post text. Example: “hashtags”: [ “indices”: [ 32, 38 ], “text”: “nodejs” ] |
media | Array of Media Objects | Represents media elements uploaded with the Post. Example: “media”: [ “display_url”: “pic.twitter.com/5J1WJSRCy9”, “expanded_url”: “https://twitter.com/nolan\_test/status/930077847535812610/photo/1”, “id”: 9.300778475358126e17, “id_str”: “930077847535812610”, “indices”: [ 13, 36 ], “media_url”: “http://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg”, “media_url_https”: “https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg” “sizes”: “thumb”: “h”: 150, “resize”: “crop”, “w”: 150 , “large”: “h”: 1366, “resize”: “fit”, “w”: 2048 , “medium”: “h”: 800, “resize”: “fit”, “w”: 1200 , “small”: “h”: 454, “resize”: “fit”, “w”: 680 , “type”: “photo”, “url”: “https://t.co/5J1WJSRCy9”, ] |
urls | Array of URL Objects | Represents URLs included in the text of a Post. Example (without Enhanced URLs enrichment enabled): “urls”: [ “indices”: [ 32, 52 ], “url”: “http://t.co/IOwBrTZR”, “display_url”: “youtube.com/watch?v=oHg5SJ…”, “expanded_url”: “http://www.youtube.com/watch?v=oHg5SJYRHA0” ] Example (with Enhanced URLs enrichment enabled): “urls”: [ “url”: “https://t.co/D0n7a53c2l”, “expanded_url”: “http://bit.ly/18gECvy”, “display_url”: “bit.ly/18gECvy”, “unwound”: “url”: “https://www.youtube.com/watch?v=oHg5SJYRHA0”, “status”: 200, “title”: “RickRoll’D”, “description”: “http://www.facebook.com/rickroll548 As long as trolls are still trolling, the Rick will never stop rolling.” , “indices”: [ 62, 85 ] ] |
user_mentions | Array of User Mention Objects | Represents other X users mentioned in the text of the Post. Example: “user_mentions”: [ “name”: “Twitter API”, “indices”: [ 4, 15 ], “screen_name”: “twitterapi”, “id”: 6253282, “id_str”: “6253282” ] |
symbols | Array of Symbol Objects | Represents symbols, i.e. $cashtags, included in the text of the Post. Example: “symbols”: [ “indices”: [ 12, 17 ], “text”: “twtr” ] |
polls | Array of Poll Objects | Represents X Polls included in the Post. Example: “polls”: [ “options”: [ “position”: 1, “text”: “I read documentation once.” , “position”: 2, “text”: “I read documentation twice.” }, “position”: 3, “text”: “I read documentation over and over again.” } ], “end_datetime”: “Thu May 25 22:20:27 +0000 2017”, “duration_minutes”: 60 ] |
Hashtag object
The entities
section will contain a hashtags
array containing an object for every hashtag included in the Post body, and include an empty array if no hashtags are present.
The PowerTrack #
Operator is used to match on the text
attribute. The has:hashtags
Operator will match if there is at least one item in the array.
Field | Type | Description |
indices | Array of Int | An array of integers indicating the offsets within the Post text where the hashtag begins and ends. The first integer represents the location of the # character in the Post text string. The second integer represents the location of the first character after the hashtag. Therefore the difference between the two numbers will be the length of the hashtag name plus one (for the ‘#’ character). Example: “indices”:[32,38] |
text | String | Name of the hashtag, minus the leading ‘#’ character. Example: “text”:“nodejs” |
Media object
The entities
section will contain a media
array containing a single media object if any media object has been ‘attached’ to the Post. If no native media has been attached, there will be no media
array in the entities
. For the following reasons the extended_entities
section should be used to process Post native media:
+ Media type
will always indicate ‘photo’ even in cases of a video and GIF being attached to Post.
+ Even though up to four photos can be attached, only the first one will be listed in the entities
section.
The has:media
Operator will match if this array is populated.
Field | Type | Description |
display_url | String | URL of the media to display to clients. Example: “display_url”:“pic.twitter.com/rJC5Pxsu” |
expanded_url | String | An expanded version of display_url. Links to the media display page. Example: “expanded_url”: “http://twitter.com/yunorno/status/114080493036773378/photo/1” |
id | Int64 | ID of the media expressed as a 64-bit integer. Example: “id”:114080493040967680 |
id_str | String | ID of the media expressed as a string. Example: “id_str”:“114080493040967680” |
indices | Array of Int | An array of integers indicating the offsets within the Post text where the URL begins and ends. The first integer represents the location of the first character of the URL in the Post text. The second integer represents the location of the first non-URL character occurring after the URL (or the end of the string if the URL is the last part of the Post text). Example: “indices”:[15,35] |
media_url | String | An http:// URL pointing directly to the uploaded media file. Example: “media_url”:“http://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg” For media in direct messages, media_url is the same https URL as media_url_https and must be accessed by signing a request with the user’s access token using OAuth 1.0A.It is not possible to access images via an authenticated x.com session. Please visit this page to learn how to account for these recent change. You cannot directly embed these images in a web page. See Photo Media URL formatting for how to format a photo’s URL, such as media_url_https , based on the available sizes . |
media_url_https | String | An https:// URL pointing directly to the uploaded media file, for embedding on https pages. Example: “media_url_https”:“https://p.twimg.com/AZVLmp-CIAAbkyy.jpg” For media in direct messages, media_url_https must be accessed by signing a request with the user’s access token using OAuth 1.0A.It is not possible to access images via an authenticated x.com session. Please visit this page to learn how to account for these recent change. You cannot directly embed these images in a web page. See Photo Media URL formatting for how to format a photo’s URL, such as media_url_https , based on the available sizes . |
sizes | Size Object | An object showing available sizes for the media file. Example: “sizes”: “thumb”: “h”: 150, “resize”: “crop”, “w”: 150 }, “large”: “h”: 1366, “resize”: “fit”, “w”: 2048 }, “medium”: “h”: 800, “resize”: “fit”, “w”: 1200 }, “small”: “h”: 454, “resize”: “fit”, “w”: 680 } } } See Photo Media URL formatting for how to format a photo’s URL, such as media_url_https , based on the available sizes . |
source_status_id | Int64 | Nullable. For Posts containing media that was originally associated with a different Post, this ID points to the original Post. Example: “source_status_id”: 205282515685081088 |
source_status_id_str | Int64 | Nullable. For Posts containing media that was originally associated with a different post, this string-based ID points to the original Post. Example: “source_status_id_str”: “205282515685081088” |
type | String | Type of uploaded media. Possible types include photo, video, and animated_gif. Example: “type”:“photo” |
url | String | Wrapped URL for the media link. This corresponds with the URL embedded directly into the raw Post text, and the values for the indices parameter. Example:“url”:“http://t.co/rJC5Pxsu” |
Media size objects
All Posts with native media (photos, video, and GIFs) will include a set of ‘thumb’, ‘small’, ‘medium’, and ‘large’ sizes with height and width pixel sizes. For photos and preview image media URLs, Photo Media URL formatting specifies how to construct different URLs for loading different sized photo media.
Sizes object
Field | Type | Description |
thumb | Size Object | Information for a thumbnail-sized version of the media. Example: “thumb”:“h”:150, “resize”:“crop”, “w”:150} Thumbnail-sized photo media will be limited to fill a 150x150 boundary and cropped. |
large | Size Object | Information for a large-sized version of the media. Example: “large”:“h”:454, “resize”:“fit”, “w”:680} Small-sized photo media will be limited to fit within a 680x680 boundary. |
medium | Size Object | Information for a medium-sized version of the media. Example: “medium”:“h”:800, “resize”:“fit”, “w”:1200} Medium-sized photo media will be limited to fit within a 1200x1200 boundary. |
small | Size Object | Information for a small-sized version of the media. Example: “small”:“h”:1366, “resize”:“fit”, “w”:2048} Large-sized photo media will be limited to fit within a 2048x2048 boundary. |
Size object
Field | Type | Description |
w | Int | Width in pixels of this size. Example: “w”:150 |
h | Int | Height in pixels of this size. Example: “h”:150 |
resize | String | Resizing method used to obtain this size. A value of fit means that the media was resized to fit one dimension, keeping its native aspect ratio. A value of crop means that the media was cropped in order to fit a specific resolution. Example: “resize”:“crop” |
Photo Media URL Formatting
Photo media on X can be loaded in different sizes. It is best to load the smallest size image that is larger enough to fit into a particular image viewport. To load different sizes, the Size Object and media_url (or media_url_https) need to be combined in a particular format. We’ll use the media entity example object already provided for our example in constructing a photo media URL.
The media_url
or media_url_https
on their own can be loaded, which will result in the medium variant being loaded by default. It is preferable, however, to provide a fully formatted photo media URL when possible.
There are three parts of a photo media URL:
Base URL | The base URL is the media URL without the file extension. For example: “media_url_https”: “https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg”, The base URL is then: https://pbs.twimg.com/media/DOhM30VVwAEpIHq |
Format | The format is the type of photo the image is formatted as. Possible formats are jpg or png, which is provided as the extension of the media URL. For example: “media_url_https”: “https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg”, The format is then: jpg |
Name | The name is the field name of the size to load. For example: “sizes”: “thumb”: “h”: 150, “resize”: “crop”, “w”: 150 , “large”: “h”: 1366, “resize”: “fit”, “w”: 2048 }, “medium”: “h”: 800, “resize”: “fit”, “w”: 1200 }, “small”: “h”: 454, “resize”: “fit”, “w”: 680 } } } The name when loading the large-sized photo would be: large |
We take these three parts (base URL, format and name) and combine them into the photo media URL to load. There are 2 formats for loading images this way, legacy and modern. All image loads should stop using the legacy format and use the modern format. Using the modern format will result in better CDN hit rate for the caller, thus improving load latencies by being less likely to have to generate and load the media from the Data Center.
Legacy format | The legacy format is deprecated. Photo media loads should all move to the modern format. <base_url>.<format>:<name> For example: https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg:large |
Modern format | The modern format for loading photos was established at X in 2015 and has been defacto since 2017. All photo media loads should move to this format. <base_url>?format=<format>&name=<name> For example: https://pbs.twimg.com/media/DOhM30VVwAEpIHq?format=jpg&name=large Note: the items in the query string for the photo media URL are in alphabetical order. If media loading were to add any additional query items, alphabetical ordering would continue to be necessary. For example, if there was the hypothetical new query item called preferred_format, it would go after format and name in the query string. |
URL object
The entities
section will contain a urls
array containing an object for every link included in the Post body, and include an empty array if no links are present.
The has:links
Operator will match if there is at least one item in the array. The url:
Operator is used to match on the expanded_url
attribute. If you are using the Expanded URL enrichment, the url:
Operator is used to match on the unwound.url
(fully unwound URL) attribute. If you are using the Exhanced URL enrichment, the url_title:
and url_decription:
Operators are used to match on the unwound.title
and unwound.description
attributes.
Field | Type | Description |
display_url | String | URL pasted/typed into Post. Example: “display_url”:“bit.ly/2so49n2” |
expanded_url | String | Expanded version of display_url . Example:“expanded_url”:“http://bit.ly/2so49n2” |
indices | Array of Int | An array of integers representing offsets within the Post text where the URL begins and ends. The first integer represents the location of the first character of the URL in the Post text. The second integer represents the location of the first non-URL character after the end of the URL. Example: “indices”:[30,53] |
url | String | Wrapped URL, corresponding to the value embedded directly into the raw Post text, and the values for the indices parameter. Example: “url”:“https://t.co/yzocNFvJuL” |
If you are using the Expanded and/or Enhanced URL enrichments, the following metadata is available under the unwound
attribute:
Field | Type | Description |
url | String | The fully unwound version of the link included in the Post. Example: “url”:“https://blog.twitter.com/en_us/topics/insights/2016/using-twitter-as-a-go-to-communication-channel-during-severe-weather-events.html” |
status | Int | Final HTTP status of the unwinding process, a ‘200’ indicating success. Example: 200 |
title | String | HTML title for the link. Example: “title”:“Using Twitter as a ‘go-to’ communication channel during severe weather” |
description | String | HTML description for the link. Example: “description”:“Using Twitter as a ‘go-to’ communication channel during severe weather” |
User mention object
The entities
section will contain a user_mentions
array containing an object for every user mention included in the Post body, and include an empty array if no user mention is present.
The PowerTrack @
Operator is used to match on the screen_name
attribute. The has:mentions
Operator will match if there is at least one item in the array.
Field | Type | Description |
id | Int64 | ID of the mentioned user, as an integer. Example: “id”:6253282 |
id_str | String | If of the mentioned user, as a string. Example: “id_str”:“6253282” |
indices | Array of Int | An array of integers representing the offsets within the Post text where the user reference begins and ends. The first integer represents the location of the ‘@’ character of the user mention. The second integer represents the location of the first non-screenname character following the user mention. Example: “indices”:[4,15] |
name | String | Display name of the referenced user. Example: “name”:“API” |
screen_name | String | Screen name of the referenced user. Example: “screen_name”:“api” |
Symbol object
The entities
section will contain a symbols
array containing an object for every $cashtag included in the Post body, and include an empty array if no symbol is present.
The PowerTrack $
Operator is used to match on the text
attribute. The has:symbols
Operator will match if there is at least one item in the array.
Field | Type | Description |
indices | Array of Int | An array of integers indicating the offsets within the Post text where the symbol/cashtag begins and ends. The first integer represents the location of the ’ character). Example: “indices”:[12,17] |
text | String | Name of the cashhtag, minus the leading ‘$’ character. Example: “text”:“twtr” |
Poll object
The entities
section will contain a polls
array containing a single poll
object if the Post contains a poll. If no poll is included, there will be no polls
array in the entities
section.
Note that these Poll metadata are only available with the following Enterprise APIs:
- Volume streams (Decahose )
- Real-time PowerTrack
- X Search APIs (Full-Archive Search and 30-Day Search)
Field | Type | Description |
options | Array of Option Object | An array of options, each having a poll position, and the text for that position. Example: “options”: [ “position”: 1, “text”: “I read documentation once.” } ] } |
end_datetime | String | Time stamp (UTC) of when poll ends. Example: “end_datetime”: “Thu May 25 22:20:27 +0000 2017” |
duration_minutes | String | Duration of poll in minutes. Example: “duration_minutes”: 60 |
Retweet and Quote Tweet details
From the X API perspective, Retweet and Quote Tweets are special kinds of Posts that contain the original Post as an embedded object. So Retweets and Quote Tweet objects are parents of a child ‘original’ Post (and thus double the size). Retweets have a top-level “retweeted_status” object, and Quoted Tweets have a “quoted_status” object. For consistency, these top-level Retweet and Quote Tweet objects also have a text property and associated entities. However, the entities at the top level can differ from the entities provided by the embedded ‘original’ entities. In case of Retweets, new text is prepended to the original Post body. For Quoted Posts, new text is appended to the Post body.
In general, the best practice is to retrieve the text, entities, original author and date from the original Post in retweeted_status whenever this exists. An exception is getting X entities that are part of the additive Quote. See below for more details and tips.
Retweets
An important detail with Retweets is that no additional X entities can be added to the Post. Users can not add hashtags, URLs or other details when they Retweet. However, the Retweet (top-level) text attribute is composed of the original Post text with “RT @username: ” prepended.
In some cases, especially with accounts with long user names, the combination of these new characters and the original Post body can easily exceed the original Post text length limit of 140 characters. In order to preserve support for 140 character based display and storage, the top-level body truncates the end of the Post body and adds an ellipsis (“…”). Consequently, some top-level entities positioned at the end of the original Post might be incorrect or missing, for instance in the case of a truncated hashtag or URL entry.
This Post, https://twitter.com/FloodSocial/status/907974220298125312, has the following Post text:
Just another test Post that needs to be exactly 140 characters with trailing URL and hashtag http://wapo.st/2w8iwPQ #Testing
In the above example, both the URL and hashtag were affected. Since the hashtag was completely truncated and the URL partially truncated, these are missing from the the top-level entities. You will also notice the additional user_mentions top-level entity coming from the “RT @floodsocial: ” prefix on the text field.
However, the Post text and entities in retweeted_status perfectly reflect the original Post with no truncation or incorrect entities, hence our recommendation to rely on the nested _retweeted_status _object for Retweets.
Quote Tweets
Quote Tweets were introduced in 2016, and differ from Retweets in that when you “quote” a Pos you are adding new content “on top” of a shared Post. This new content can include nearly anything an original Post can have, including new text, hashtags, mentions, and URLs.
Quote Tweets can contain native media (photos, videos, and GIFs), and will appear under the entities object.
Since X entities can be added, the Quote entities are likely different from the original entities.
In this example, a new URL and hashtag were positioned at the end of the Quote Tweet.
This Post, https://twitter.com/FloodSocial/status/907983973225160704, has the following Post text:
strange and equally tragic when islands flood… trans-atlantic testing of quote tweets | @thisuser @thatuserhttp://bit.ly/2vMMDuu #testing
In this case, the top-level entities do not reflect the Quote details.
However, the Post text and entities in extended_tweet perfectly reflect the Quote Tweet with no truncation or incorrect entities, hence our recommendation to rely on the nested _extended_tweet _object for Quote Tweets.
Entities for user object
Entities for User Objects describe URLs that appear in the user defined profile URL and description fields. They do not describe hashtags or user_mentions. Unlike Post entities, user entities can apply to multiple fields within its parent object — to disambiguate, you will find a parent nodes called url and description that indicate which field contains the entitized URL.
In this example, the user url field contains a t.co link that is fully expanded within the entities/url/urls[0] node of the response. The user does not have a wrapped URL in their description.
JSON example
X extended entities
Jump to on this page
Example Tweets and JSON payloads
- Tweet with four native photos
Introduction
If a Post contains native media (shared with the Post user-interface as opposed via a link to elsewhere), there will also be a extended_entities section. When it comes to any native media (photo, video, or GIF), the extended_entities is the preferred metadata source for several reasons. Currently, up to four photos can be attached to a Post. The entities metadata will only contain the first photo (until 2014, only one photo could be included), while the extended_entities section will include all attached photos. With native media, another deficiency of the entities.media metadata is that the media type will always indicate ‘photo’, even in cases where the attached media is a video or animated GIF. The actual type of media is specified in the extended_entities.media[].type attribute and is set to either photo, video, or animated_gif. For these reasons, if you are working with native media, the extended_entities metadata is the way to go.
All Posts with attached photos, videos and animated GIFs will include an extended_entities
JSON object. The extended_entities
object contains a single media
array of media
objects (see the entities
section for its data dictionary). No other entity types, such as hashtags and links, are included in the extended_entities
section. The media
object in the extended_entities
section is identical in structure to the one included in the entities
section.
Posts can only have one type of media attached to it. For photos, up to four photos can be attached. For videos and GIFs, one can be attached. Since the media type
metadata in the extended_entities
section correctly indicates the media type (‘photo’, ‘video’ or ‘animated_gif’), and supports up to 4 photos, it is the preferred metadata source for native media.
Example Posts and JSON payloads
Below are some example Posts and their associated entities metadata.
Post with four native photos
Post with hashtag, user mention, cashtag, URL, and four native photos:
Here is the entities
section for this Post:
Only in this ‘extended’ payload below will you find the four (maximum) native photos. Notice that the first photo in the array is the same as the single photo included in the non-extended X entities section. The media metadata structure for photos is the same for both entities and extended_entities sections.
Here is the extented_entities
section for this Post:
Post with native video
Below is the extended entities metadata for this Post with a video:
When an advertiser chooses to limit video playback to just X owned and operated platforms, the video_info
object will be replaced with an additional_media_info
object.
The additional_media_info
will contain additional media info provided by the publisher, such as title
, description
and embeddable flag
. Video content is made available only to X official clients when embeddable=false
. In this case, all video URLs provided in the payload will be X-based, so the user can open the video in a X owned property by clicking the link.
Here is an example of what the extended entities object will look like in this situation:
As discussed above, here is the entities
section that incorrectly has the type
set to ‘photo’. Again, the extended_entities
section is preferred for all native media types, including ‘video’ and ‘animated_gif’.
Post with an animated GIF
Below is the extended entities metadata for this Post with an animated GIF:
Native Enriched example payloads
Post
Post reply
Extended Post
Post with extended_entitites
Retweet
Quote Tweet
Retweeted Quote Tweet
Enterprise Activity Streams data objects
Interested in learning more about how the Activity Streams data format maps to the X API v2 format?
Check out our comparison guide: Activity Streams compared to X API v2
Please note: It is highly recommended to use the Enriched Native format for enterprise data APIs.
-
The Enriched Native format includes all new metadata since 2017, such as poll metadata, and additional metrics such as reply_count and quote_count.
-
Activity Streams format has not been updated with new metadata or enrichments since the character update in 2017.
Activity Object
Activity Streams is an object schema translation of X’s original data format created by Gnip to ‘normalize the format’ of Post data and other social media data using the third party Activity Base Schema described here. Posts are normalized into the activity streams schema, including: note, person, place and service object types as nested objects. Posts can have other nested Post activity obejcts for Retweets, or others including twitter_quoted_status, long_object.
The base level object type “activity” is similar to the Post base level object of the native enriched format. Example payloads in activity streams format can be found here.
Data Dictionary
Below you will find the data dictionary for these ‘root-level’ “activity” attributes, as well as links to child object data dictionaries.
Attribute | Type | Description |
id | string | A unique IRI for the post. In more detail, “tag” is the scheme, “search.x.com” represents the domain for the scheme, and 2005 is when the scheme was derived. When storing Posts, this should be used as the unique identifier or primary key. “id”: “tag:search.x.com,2005:1050118621198921728” |
objectType | string | Type of object, always set to “activity” “objectType”: “activity” |
object | object | An object representing post being posted or shared. For Retweets, this will contain an entire “activity”, with the pertinent fields described in this schema. For Original posts, this will contain a “note” object, with the fields described here. “object”: “object”: “objectType”: “note”, “id”: “object:search.x.com,2005:1050118621198921728”, “summary”: “To make room for more expression, we will now count all emojis as equal—including those with gender and skin t… https://t.co/MkGjXf9aXm”, “link”: “http://twitter.com/TwitterAPI/statuses/1050118621198921728”, “postedTime”: “2018-10-10T20:19:24.000Z” |
long_object | object | An object representing the full text body if the post text extends beyond 140 characters. “long_object”: “body”: “To make room for more expression, we will now count all emojis as equal—including those with gender and skin tone modifiers 👍🏻👍🏽👍🏿. This is now reflected in Twitter-Text, our Open Source library. \n\nUsing Twitter-Text? See the forum post for detail: https://t.co/Nx1XZmRCXA”, “display_text_range”: [ 0, 277 ], “twitter_entities”: “hashtags”: [], “urls”: [ “url”: “https://t.co/Nx1XZmRCXA”, “expanded_url”: “https://twittercommunity.com/t/new-update-to-the-twitter-text-library-emoji-character-count/114607”, “display_url”: “twittercommunity.com/t/new-update-t…”, “indices”: [ 254, 277 ] ], “user_mentions”: [], “symbols”: [] |
display_text_range | array | if the post text extends beyond 140 characters. “display_text_range”: [ 0, 142 ] |
verb | string | The type of action being taken by the user. Posts, “post” Retweets, “share” Deleted Posts, “delete” The verb is the proper way to distinguish between a Tweet and a true Retweet. However, this only applies to true retweets, and not modified or quoted Tweets, which don’t use X Retweet functionality. For a description of AS verbs click here. For Deletes, note that only a limited number of fields will be included, as shown in the sample payload below. “verb”: “post” |
postedTime | date (ISO 8601) | The time the action occurred, e.g. the time the post was posted. “postedTime”: “2018-10-10T20:19:24.000Z” |
generator | object | An object representing the utility used to post the post. This will contain the name (“displayName”) and a link (“link”) for the source application generating the Post. “generator”: “displayName”: “Twitter Web Client”, “link”: “http://twitter.com” |
provider | object | A JSON object representing the provider of the activity. This will contain an objectType (“service”), the name of the provider (“displayName”), and a link to the provider’s website (“link”). “provider”: “objectType”: “service”, “displayName”: “Twitter”, “link”: “http://www.twitter.com” |
link | string | A Permalink for the post. “link”: “http://twitter.com/TwitterAPI/statuses/1050118621198921728” |
body | string | The post text. In Retweets, note that X modifies the value of the body at the root level by adding “RT @username” at the beginning, and by truncating the original text and adding an ellipsis at the end. Thus, for Retweets, your app should look at the object.body to ensure that it is extracting the non-modified text of the original Post (being retweeted). “body”: “With Cardiff, Crystal Palace, and Hull City joining the EPL from the Championship it will be a great relegation battle at the end.” |
display_text_range | array | Describes the range of characters within the body text that indicates the displayed Post. Posts with leading @mentions will start at more than 0 and Posts with attached media or that extened beyond 140 characters will indicate the display_text_range in the long_object. “display_text_range”: [ 14, 42 ] or “long_object”: “display_text_range”: [ 0, 277 ]… |
actor | object | An object representing the x user who posted. The Actor Object refers to a X User, and contains all metadata relevant to that user. See actor object details |
inReplyTo | object | A JSON object referring to the Post being replied to, if applicable. Contains a link to the Post. “inReplyTo”: “link”: “http:\/\/twitter.com\/GOP\/statuses\/349573991561838593” |
location | object | A JSON object representing the X “Place” where the post was created. This is an object passed through from the X platform. See location object |
twitter_entities | object | The entities object from X’s data format which contains lists of urls, mentions and hashtags. Please reference the X documentation on Entities here Note that in Retweets, X may truncate the values of entities that it extracts at the root level. So, for Retweets, your app should look at object.twitter_entities to ensure that you are using non-truncated values. See twitter_entities object details |
twitter_extended_entities | object | An object from X’s native data format containing “media”. This will be present for any post where the twitter_entities object has data present in the “media” field, and will include multiple photos where present in the post. Note that this is the correct location to retrieve media information for multi-photo posts. Multiple photos are represented by comma-separated JSON objects within the “media” array. See twitter_extended_entities object details |
gnip | object | An object added to the activity payload to indicate the matching rules, and added enriched data based on enrichments active on the stream or product. See gnip object details |
edit_history | Object | Unique identifiers indicating all versions of a Post. For Posts with no edits, there will be one ID. For Posts with an edit history, there will be multiple IDs, arranged in ascending order reflecting the order of edits, with the most recent version in the last position of the array. The Post IDs can be used to hydrate and view previous versions of a Post. Example: edit_history”: “initial_tweet_id”: “1283764123” “edit_tweet_ids”: [“1283764123”, “1394263866”] |
edit_controls | Object | When present, indicates how long a Post is still editable for and the number of remaining edits. Posts are only editable for the first 30 minutes after creation and can be edited up to five times. The Post IDs can be used to hydrate and view previous versions of a Post. Example: “edit_controls”: “editable_until_ms”: 123 “edits_remaining”: 3 |
editable | Boolean | When present, indicates if a Post was eligible for edit when published. This field is not dynamic and won’t toggle from True to False when a Post reaches its editable time limit, or maximum number of edits. The following Post features will cause this field to be false: * Posts is promoted * Post has a poll * Post is a non-self-thread reply * Post is a retweet (note that Quote Tweets are eligible for edit) * Post is nullcast * Community Post * Superfollow Post * Collaborative Post |
Additional Post attributes
Attribute | Type | Description |
---|---|---|
twitter_lang | string | |
favoritesCount | int | Nullable. Indicates approximately how many times this Post has been liked by X users. “favoritesCount”:298 |
retweetCount | int | Number of times this Post has been retweeted. Example: “retweetCount”:153 |
Deprecated Attributes
Field | Type | Description |
geo | object | Point location where the Post was created. |
twitter_filter_level | string | Deprecated field left in for non breaking change |
Nested Post activity obejcts
In several cases, a Post object will included other nested Posts. If you are working with nested objects, then that JSON payload will contain multiple objects, and each Post object may contain its own objects. The root-level object will contain information on the type of action taken, i.e. whether it is a Retweet or a Quote Tweet, and may also contain an object that describes the ‘original’ Post being shared. Extended Posts will include a nested extended object that extends beyond 140 characters, which was used to prevent breaking changes when the update was made in 2017. Each nested object dictionary is described below.
Retweets
Activity streams format of Retweets includes a nested object with the type “activity” and the verb “note” to represent the original Post being Retweeted.
X quoted status
Activity streams format embeded quote Tweets
{ "id": "tag:search.twitter.com,2005:222222222222", "objectType": "activity", "verb": "post", "body": "Quoting a Tweet: https://t.co/mxiFJ59FlB", "actor": { "displayName": "TheQuoter2" }, "object": { "objectType": "note", "id": "object:search.twitter.com,2005:111111111", "summary": "https://t.co/mxiFJ59FlB" }, "twitter_entities": {}, "twitter_extended_entities": {}, "gnip": {}, "twitter_quoted_status": { "id": "tag:search.twitter.com,2005:111111111", "objectType": "activity", "verb": "post", "body": "console.log('Happy birthday, JavaScript!');", "actor": { "displayName": "TheOriginalTweeter" }, "object": { "objectType": "note", "id": "object:search.twitter.com,2005:111111111" }, "twitter_entities": {} } }
Retweeted Quote Tweet:
Long object
Activity streams format of the extended_tweet
Actor object
The actor object contains X User account metadata that describes the X User which created the activity.
Data Dictionary
Attribute | Type | Description |
---|---|---|
objectType | string | ”objectType”: “person” |
id | string | The string representation of the unique identifier for this author. Example: “id:x.com:2244994945” |
link | ”http://www.x.com/XDeveloeprs | |
displayName | String | The name of the user, as they’ve defined it. Not necessarily a person’s name. Typically capped at 50 characters, but subject to change. Example: “displayName”: “XDevelopers” |
preferredUsername | string | The screen name, handle, or alias that this user identifies themselves with. Unique but subject to change. Use id as a user identifier whenever possible. Typically a maximum of 15 characters long, but some historical accounts may exist with longer names. Example:“preferredUsername”: “XDevelopers” |
location | object | ** “location”: “objectType”:** “place”, “displayName”: “127.0.0.1” ** }** |
links | array | Nullable . A URL provided by the user in association with their profile. Example: ** “links”: [ { “href”:** “https://developer.twitter.com/en/community”, “rel”: “me” ** } ]** |
summary | string | Nullable . The user-defined UTF-8 string describing their account. Example: “summary”: “The voice of the #XDevelopers team…“ |
protected | Boolean | When true, indicates that this user has chosen to protect their Posts. See About Public and Protected Posts. Example: “protected”: true |
verified | Boolean | When true, indicates that the user has a verified account. See Verified Accounts . Example: “verified”: false |
followersCount | Int | The number of followers this account currently has. Under certain conditions of duress, this field will temporarily indicate “0”. Example: “followers_count”: 21 |
friendsCount | Int | The number of users this account is following (AKA their “followings”). Under certain conditions of duress, this field will temporarily indicate “0”. Example: “friends_count”: 32 |
listedCount | Int | The number of public lists that this user is a member of. Example: “listed_count”: 9274 |
favoritesCount | Int | The number of Posts this user has liked in the account’s lifetime. British spelling used in the field name for historical reasons. Example: “favourites_count”: 13 |
statusesCount | Int | The number of Posts (including retweets) issued by the user. Example: “statuses_count”: 42 |
postedTime | date | The UTC datetime that the user account was created on X. Example: “postedTime”: “2013-12-14T04:35:55.036Z” |
image | string | A HTTPS-based URL pointing to the user’s profile image. Example: “image”: “https://pbs.twimg.com/profile\_images/1283786620521652229/lEODkLTh\_normal.jpg” |
No longer supported (deprecated) attributes
Field | Type | Description |
---|---|---|
utcOffset | null | Value will be set to null. Still available via GET account/settings |
twitterTimeZone | null | Value will be set to null. Still available via GET account/settings as tzinfo_name |
languages | null | Value will be set to null. Still available via GET account/settings as language |
Examples:
Location Object
Location obejcts can exist within the actor obejct set on the X account level or within the profileLocations object of the gnip object. Location objects have a place object type and can have a name, address, or geo coordinates. Location objects are similar to Geo in native enriched format.
Location data dictionary
Field | Type | Description |
---|---|---|
objectType | string | See here for more detailed information. Example: “objectType”: “place” |
displayName | string | The full name of the location. ****“displayName”: “United States” |
name | string | Name of the location from X’s place JSON format. |
link | string | A link to the full X JSON representation of the place. “link”: “https://api.x.com/1.1/geo/id/27c45d804c777999.json” |
geo | object | The geo coordintates object from X. Either a polygon, or point. See geo |
countryCode | String | Shortened country code representing the country containing this place. Example: “countryCode”: “US |
country | String | Name of the country containing this place. Example: **“country”: **“United States” |
profileLocations derived obejcts
Field | Type | Description |
address | object | Within profileLocation location object within the gnip object. Address of location derived by the profile geo enrichement. Level of granularity will vary. “address”: { ** “country”: “United States”, “countryCode”: “US”, “locality”: “Providence”, “region”: “Rhode Island”, “subRegion”: “Providence County” }** |
geo | object | Within profileLocation location object within the gnip object. Centroid coordinates of the location derived by the profile geo enrichement. ”geo”: { ** “coordinates”: [ -98.5, 39.76 ], “type”: “point” }** |
Examples
X entities object
For Activity streams format, the twitter_entities is the same format and data dictionary shown on the native enriched format entities object here.
Example:
X extended entities object
For Activity streams format, the twitter_extended_entities is the same format and data dictionary shown on the native enriched format extended_entities object here.
Example:
Gnip object
The gnip object, within Activity streams format, contains the metadata added by the active enrichments, as well as indication of the matching rules for the activity.
Data dictionary
Field | Type | Description |
matching_rules | array | Contains an array of matching rule objects which indicate the rule which the activity matches on. “matching_rules”: [ ** { “tag”: null, “id”:** 1026514022567358500**, “id_str”:** “1026514022567358464” ** } ]** |
urls | array | Contains an array of the links within the activity, and the expanded url metadata for the URL unwinding enrichement “urls”: [ { “url”:* “https://t.co/tGQqNxxyhU”, “expanded_url”: “https://www.youtube.com/channel/UCwUxW2CV2p5mzjMBqvqLzJA”, “expanded_status”: 200**, “expanded_url_title”:** “Birdys Daughter”, “expanded_url_description”: “Premium, single-origin, handpicked Jamaica Blue Mountain Coffee” ** } ]** |
profileLocations | array of location objects | Contains the derived location object from the Profile Geo enrichment ** “profileLocations”: [ { “address”: { “country”:** “Canada”, “countryCode”: “CA”, “locality”: “Toronto”, “region”: “Ontario” ** }, “displayName”:** “Toronto, Ontario, Canada”, ** “geo”: { “coordinates”: [ -79.4163, 43.70011 ], “type”:** “point” ** }, “objectType”:** “place” ** } ] }** |
Example:
Activity streams payload examples
Post activity
Reply Post activity
Post activity with long_object
Post activity with twitter_extended_entities
Retweet activity
Quote Tweet activity
Retweetd Quote Tweet activity
Tweet metadata timeline
Jump to on this page
Introduction**
At its core, X is a public, real-time, and global communication network. Since 2006, X’s evolution has been driven by both user use-patterns and conventions and new product features and enhancements. If you are using X data for historical research, understanding the timeline of this evolution is important for surfacing Posts of interest from the data archive.
X was launched as a simple SMS mobile app, and has grown into a comprehensive communication platform. A platform with a complete set of APIs. APIs have always been a pillar of the X network. The first API hit the streets soon after X was launched. When geo-tagging Posts was first introduced in 2009, it was made available through a Geo API (and later the ability to ‘geo-tag’ a Post was integrated into the X.com user-interface). Today, X’s APIs drive the two-way communication network that has become the source of breaking news and sharing information. The opportunities to build on top of this global, real-time communication channel are endless.
X makes available two historical APIs that provide access to every publicly available Post: Historical PowerTrack and the Full-Archive Search API. Both APIs provide a set of operators used to query and collect Posts of interest. These operators match on a variety of attributes associated with every Post, hundreds of attributes such as the Post’s text content, the author’s account name, and links shared in the Post. Posts and their attributes are encoded in JSON, a common text-based data-interchange format. So as new features were introduced, new JSON attributes appeared, and typically new API operators were introduced to match on those attributes. If your use-case includes a need to listen to what the world has said on X, the better you understand when operators started having JSON metadata to match on, the more effective your historical PowerTrack filters can be.
Next, we will introduce some key concepts that set the stage for understanding how updates in Post metadata affect finding your data signal of interest.
Key concepts**
From user-conventions to X first-class objects
X users organically introduced new, and now fundamental, communication patterns to the X network. A seminal example is the hashtag, now nearly universally used across all social networks. Hashtags were introduced as a way to organize conversations and topics. On a network with hundreds of millions messages a day, tools to find Posts of interest are key, and hashtags have become a fundamental method. Soon after the use of hashtags grew, they received official status and support from X. As hashtags became a ‘first-class’ object, this meant many things. It meant hashtags became clickable/searchable in the X.com user interface. It also meant hashtags became a member of the X entities family, along with @mentions, attached media, stock symbols, and shared links. These entities are conveniently encoded in a pre-parsed JSON array, making it easier for developers to process, scan, and store them.
Retweets are another example of user-driven conventions becoming official objects. Retweeting emerged as a way of ‘forwarding’ content to others. It started as a manual process of copying/pasting a Post and prepending it with a “RT @” pattern. This process was eventually automated via a new Retweet button, complete with new JSON metadata. The ‘official’ Retweet was born. Other examples include ‘mentions’, sharing of media and web links, and sharing a location with your Post. Each of these use-patterns resulted in new x.com user-interface features, new supporting JSON, and thus new ways to match on Posts. All of these fundamental Post attributes have resulted in PowerTrack Operators used to match on them.
Post metadata, mutability, updates, and currency
While Post messages can be up to a fixed number of characters long, the JSON description of a Post consists of over 100 attributes. Attributes such as who posted, at what time, whether it’s an original Post or a Retweet, and an array of first-class objects such as hashtags, mentions, and shared links. For the account that posted, there is a User (or Actor) object with a variety of attributes that provide the user’s Profile and other account metadata. Profiles include a short biographical description, a home location (freeform text), preferred language, and an optional web site link.
Some account metadata never change (e.g. numeric user ID and created date), some change slowly over time, while other attributes change more frequently. People change jobs and move. Companies updates their information. When you are collecting historical Posts, it is important to understand how some metadata is as it was when Posted, and other metadata is as it is when the query is submitted.
With all historical APIs, the user’s profile description, display name, and profile ‘home’ attributes are updated to the values at the time of query.
“Native” media
X.com and X mobile apps support adding photos and videos to Post by clicking a button and browsing your photo galleries. Now that they are integrated as first-class actions, videos and photos shared this way are referred to as ‘native’ media.
Many querying Operators work with these ‘native’ resources, including has:videos
, has:images
, and has:media
. These will match only on media content that was shared via X features. To match on other media hosted off of the X platform, you’ll want to use Operators that match on URL metadata.
So, before we dig into the Historical PowerTrack and Full-Archive Search product details, let’s take a tour of how X, as a product and platform, evolved over time.
X timeline
Below you will find a select timeline of X. Most of these X updates in some way fundamentally affected either user behavior, Post JSON contents, query Operators, or all three. Looking at X as a API platform, the following events in some way affected the JSON payloads that are used to encode Posts. In turn, those JSON details affect how X historical API match on them.
Note that this timeline list is generally precise and not exhaustive.
2006
- October
- @replies becomes a convention.
- Cashtags became a clickable/searchable link in June 2012.
- November - Favorites introduced.
2007
- January - @replies become a first-class object with a UI reply button with
in_reply_to
metadata. - April - Retweets become a convention.
- August - #hashtags emerge as a primary tool for searching and organizing Posts.
2009
- February - $cashtags become a common convention for discussing stock ticker symbols.
- May - Retweet ‘beta’ is introduced with “Via @” prepended to Post body.
- June - Verified accounts introduced.
- August - Retweets a first-class object with “RT @” pattern and new
retweet_status
metadata. - October - List feature launched.
- November - Post Geotagging API is launched, providing the first method for users to share location via third-party apps.
2010
- June - X Places introduced for geo-tagging Posts.
- August - Post button for websites is launched. Made sharing links easier.
2011
- May - Follow button introduced, making it easier to follow accounts associated with websites.
- August - Native photos introduced.
2012
- June - $Cashtags become a clickable/searchable link.
2014
- March - Photo tagging and up to four photos supported. Extended X Entities metadata was introduced.
- April - Emojis are natively supported in X UI. Emojis were commonly used in Posts since at least 2008.
2015
- April - A change in X’s ‘post’ user-interface design results in fewer Posts being geo-tagged.
- October - X Polls introduced. Polls originally supported two choices with a 24-hour voting period. In November, Polls started supporting four choices with voting periods from 5 minutes to seven days. Poll metadata made available (enriched native format only) in February 2017.
2016
- February - Searchable GIFs natively hosted in Post compose.
- May - “Doing More with 140” (dmw140) announced, stating plans for new ways of handling Replies and attached media with respect to a Post’s 140-character message.
- June - Native video support
- June - Quoted Retweets generally available.
- June - Stickers introduced for adding to photos.
- September - ‘Native attachments’ introduced with trailing URL not counted towards 140 characters (“dmw140, part 1”).
2017
- February - X Poll metadata included in Post metadata (enriched native format only).
- April - ‘Simplified Replies’ introduced with replied-to-accounts not counted towards 140 characters (“dmw140, part 2”).
2018
- May - GDPR updates user.time_zone set to null, user.utc_offset set to null, user.profile_background_image_url set to default value
- June - Updating quoteTweet formatting changes
2022
- September 29 - The ability to edit Posts is rolled out to a small test group. Edited Post metadata are added to the Post object where relevant. This includes edit_history and edit_controls objects. These metadata will not be returned for Posts that were created before editable functionality was added. No associated Operators for these metadata. To learn more about how Post edits work, see the Edit Posts fundamentals
Filtering tips
Being familiar with the X timeline of when and how new features were added can help you create more effective queries. Here, a query means a filter or rule that is applied by the X historical APIs to the Post archive, using PowerTrack Operators to match on Post JSON. An example is the lang:
Operator, which is used to match Posts in a specified language. Twitter provides a language classification service (supporting over 50 languages), and X APIs provide this metadata in the JSON that is generated for every Post. So, if a Post is written in Spanish the “lang” JSON attribute is set to “es”. So, if you build a filter with the lang:es
clause, it will only match on Post messages classified as Spanish.
The timeline information can also help better interpret the Post data received. Say you were researching the sharing of content about the 2008 and 2012 Summer Olympics. If you applied only the is:retweet
Operator to match on Retweets, no data would match in 2008. However, for 2012 there would likely be millions of Retweets. From this you potentially could erroneously conclude that in 2008 Retweets were not a user convention, or that simply no one Retweeted about those Olympics. Since Retweets became a first-class object in 2009, you need to add a ”RT @”
rule clause to help identify them in 2008.
Both Retweets and Post language classifying are examples of Post attributes with a long history and many product details. Below we will discuss more details of these and other attribute classes important to matching on and understanding X Data.
Recognizing false negatives
When it comes to writing filters, one important takeaway is that the metadata Operators match on all have “born on” dates. If you build a filter with an Operator that acts on metadata introduced after the Post was posted, you’ll have a false negative. For example, say you are interested in all Posts that mention ‘snow’ and share a video. If you build a rule with the has:videos
Operator, which matches on Posts with native videos, that clause will not match any Posts before 2015.
However, sharing of videos has been common on X long before 2015. Before then users shared links to videos hosted elsewhere, but in 2015, X built new ‘sharing video’ features directly into the platform. For finding these earlier Posts of interest, you would include a rule clause such as url:”youtube.com”
.
Note, with the Search APIs, there are some examples of metadata being ‘backfilled’ as its index was rebuilt. One good example are cashtag operator was introduced in 2015, the Search index was rebuilt, and in the process the symbol entity was extracted from all Post bodies, including early 2006 when $
was used mainly for slang; “I hope it $oon!”.
Identifying and filtering on Post attributes important to your use-case
Some metadata, such as X account numeric IDs, have existed since day one (and are an example of account metadata that never changes). Other metadata was not introduced until well after X started in 2006. Examples of new metadata being introduced include Retweets metadata, Post locations, URL titles and descriptions, and ‘native’ media. Below are some of the most common types of Post attributes that have been fundamentally affected by these X platform updates.
Filtering/matching behavior for these depends, in most cases, on which historical Post API is used. To help determine which product is the best fit for your research and use-case, the attribute details provided below include high-level product information.
X Profiles
Since at its core X is a global real-time communication channel, research with Post data commonly has an emphasis on who is communicating. Often it is helpful to know where a X user calls home. Often knowing that an account bio includes mentions of interests and hobbies can lead you to Post of interest. It is very common to want to listen for Posts from accounts of interest. Profile attributes are key to all of these use-cases.
Every account on X has a Profile that includes metadata such as X @handle, display name, a short bio, home location (freeform text entered by a user), number of followers and many others. Some attributes never change, such as numeric user ID and when the account was created. Others usually change day-to-day, week-to-week, or month-to-month, such as number of Posts posted and number of accounts followed and followers. Other account attributes can also change at any time, but tend to change less frequently: display name, home location, and bio.
The JSON payload for every Post includes account profile metadata for the Post’s author. If it is a Retweet, it also includes profile metadata for the account that posted the original Post.
The mutability of a Post’s profile metadata depends entirely on the historical product used. The Search APIs serve up historical Posts with the profile settings as it is at the time of retrieval. For Historical PowerTrack, the profile is as it was at the time the Post was posted, except for data before 2011. For Posts older than 2011, the profile metadata reflects the profile as it was in September 2011.
Original Post and Retweets
Retweets are another example of user-driven conventions becoming official objects. Retweeting emerged as a way of ‘forwarding’ content to others. It started as a manual process of copying/pasting a Post and prepending it with a “RT @” pattern. This process was eventually automated via a new Retweet button, complete with new JSON metadata. The ‘official’ Retweet was born and the action of retweeting became a first-class Post event. Along with the new Retweet button, new metadata was introduced such as the complete payload of the original Post.
Whether a Post is original or shared is a common filtering ‘switch.’ In some cases, only original content is needed. In other cases, Post engagement is of primary importance so Retweets are key. The PowerTrack is:retweet
Operator enables users to either include or exclude Retweets. If pulling data from before August 2009, users need to have two strategies for Retweet matching (or not matching). Before August 2009, the Post message itself needs to be checked, using exact phrase matching, for matches on the “@RT ” pattern. For periods after August 2009, the is:retweet
Operator is available.
Post language classifications
The language a Post is written in is a common interest. Post language can help infer a Post’s location and often only a specific language is needed for analysis or display. (X profiles also have a preferred language setting.)
For filtering on a Post’s language classification, X’s historical products (Search API and Historical PowerTrack) are quite different. When the Search archive was built, all posts were backfilled with the X language classification. Therefore the lang:
Operator is available for the entire post archive. With Historical PowerTrack, X’s language classification metadata is available in the archive beginning on March 26, 2013.
Geo-referencing Posts
Being able to tell where a Post was posted (i.e., geo-referencing it) is important to many use-cases. There are three primary methods for geo-referencing Posts:
- Geographical references in a Post message
- Posts geo-tagged by the user.
- Account profile ‘home’ location set by a user
Geographical references in a Post message
Matching on geographic references in the Post message, while often the most challenging method since it depends on local knowledge, is an option for the entire Post archive. Here is an example geo-referenced match from 2006 for the San Francisco area based on a ‘golden gate’ filter:
https://x.com/biz/statuses/28311
Posts geo-tagged by the user
In November 2009 X introduced its Post Geotagging API that enabled Posts to be geo-tagged with an exact location. In June 2010 X introduced X Places that represent a geographic area on the venue, neighborhood, or town scale. Approximately 1-2% of Posts are geo-tagged using either method.
The available geo-tagging history is dependent on the Historical API you are using. With the Search APIs the ability to start matching on Posts with some Geo Operators started in March 2010, and with others on February 2015. If you are using Historical PowerTrack, geo-referencing starts on September 1, 2011. When the Historical PowerTrack archive was built, all geo-tagging before this date was not included.
Account profile ‘home’ location set by a user
All X users have the opportunity to set their Profile Location, indicating their home location. Millions of X users provide this information, and it significantly increases the amount of geodata in the X Firehose. This location metadata is a non-normalized, user-generated, free-form string. Approximately 30% of accounts have Profile Geo metadata that can be resolved to the country level.
As with Post geo, the methods to match and the time periods available depends on the Historical API you are using. Historical PowerTrack enables users to attempt their own custom matching on these free-form strings. To help make that process easier, X also provides a Profile Geo Enrichment that performs the geocoding where possible, providing normalized metadata and corresponding Operators. Profile Geo Operators are available in both Historical PowerTrack and the Search APIs. With Historical PowerTrack, these Profile Geo metadata is available starting in June 2014. With the Search APIs, this metadata is available starting in February 2015.
Shared links and media
Sharing web page links, photos and videos have always been a fundamental X use-case. Early in its history, all of these actions involved including a URL link in the Post message itself. In 2011 X integrated sharing photos directly into its user-interface. In 2016, native videos were added.
Given this history, there are a variety of filtering Operators used for matching on this content. There are a set of Operators that match on whether Posts have shared links, photos, and videos. Also, since most URLs shared on X are shortened to use up fewer of a Post’s characters (e.g. generated by a service such as bitly or tinyurl), X provides data enrichments that generate a complete, expanded URL that can be matched on. For example, if you wanted to match on Posts that included links discussing X and Early-warning systems, a filter that references ‘severe weather communication’ would match a Post containing this http://bit.ly/1XV1tG4 URL.
In March 2012, the expanded URL enrichment was introduced. Before this time, the Post payloads included only the URL as provided by the user. So, if the user included a shortened URL it can be challenging to match on (expanded) URLs of interest. With both Historical PowerTrack and the Search APIs, these metadata are available starting in March 2012.
In July 2016, the enhanced URL enrichment was introduced. This enhanced version provides a web site’s HTML title and description in the Post payload, along with Operators for matching on those. With Historical PowerTrack, these metadata become available in July 2016. With the Search APIs, these metadata begin emerging in December 2014.
In September 2016 X introduced ‘native attachments’ where a trailing shared link is not counted against the 140 Post character limit. Both URL enrichments still apply to these shared links.
For other URL product-specific details on URL filtering, see the corresponding articles for more information.