This is an introduction
This will provide an introduction to filtering for data in BrandsEye
BrandsEye’s API supports a small language to help you select exactly the mentions that you want to select. You’ll do this with the help of the mentions()
function, which always expects a filter. You can also count and aggregate mention data using count_mentions()
. We also have a query language, that this will introduce a bit of.
Combined with dplyr1 for manipulating data, ggplot22 for visualising data, and readr3 for exporting data, brandseyer2 is a powerful tool for examining social media data.
Filtering is selecting individual mentions to make up a data set, either to be returned directly using the mentions()
function, or for counting and aggregating using count_mentions()
.
library(brandseyer2)
account("TEST01AA") %>%
mentions(filter = "published inthelast week")
#> # A tibble: 1 x 7
#> id published brands sentiment mediaLinks socialNetwork tags
#> <chr> <dttm> <list> <int> <list> <chr> <list>
#> 1 2-1 2018-03-03 21:37:35 <int [2… 0 <tibble [1 … TWITTER <int …
All mentions are published at a particular date and time. This is the most important thing that you can filter on: the publication date. Most questions that we want to answer usually have a constrained time period. For example,
what was the sentiment towards our brand last month?
or
how did sentiment change towards our brand over Christmas last year compared to Christmas the year before?
The published
field allows us to filter on mentions by their date of publication. For example,
published after ‘2018/01/01’
will filter for mentions that are published on or after the 1st of January, 2018. Note that mentions published on the first are also included by the filter.
published before ‘2018/02/01’
filters for mentions published strictly before the 1st of February, 2018. It does not include mentions from the 1st of February.
These two filters can be combined using the word and
, like so:
published after ‘2018/01/01’ and published before ‘2018/02/01’
Some things to notice in this filter: it’s made up of two sub-filters, the after bit, and the before bit. These have been combined using an and
. Dates themselves are quoted in single-quotes, like so: ‘2018/01/01’.
Only mentions for which are published both after 2018/01/01 and before 2018/02/02, will be selected by this filter. In other words, this filter selects mentions published during the month of January.
We will have more to say about the connector and
in a later section of this guide.
You can test this out now, using either analyse or brandseyer2.
All of our examples, going forward, will assume that you have access to your accounts, and that you’re using brandseyer2. We won’t do anything fancy, just the bare minimum brandseyer2 to demonstrate the data.
mentions <- account("QUIR01BA") %>%
filter_mentions("published after '2018/01/01' and published before '2018/02/01'" ) %>%
count_mentions
mentions
#> # A tibble: 1 x 1
#> mentionCount
#> <int>
#> 1 36592
So, we have 36 592 mentions published in the month of January. Instead of counting the mentions with count_mentions()
, you could instead use mentions()
to pull a table of all of these mentions.
The filter language provides some convenient shortcuts. A common one is to find mentions from the last week
published inthelast week
Notice that inthelast
is written as one word.
Similarly, we can find mentions
Dates first, since this allows people to constrain data, rather than get everything. How do they cancel if they’re accidently pulling more than they want to?
There are three parts to a filter: a field, an operator, and a value. The field is a part of a mention. The operator looks different depending on whether it’s a number or text.
AND
? Both things must be true.OR
? Either things must be true.Just give me things from twitter
socialNetwork is TWITTER
Social network codes can be obtained using data_model_networks()
Currently, these are
data_model_networks()
#> # A tibble: 9 x 2
#> id name
#> <chr> <chr>
#> 1 TWITTER Twitter
#> 2 FACEBOOK Facebook
#> 3 INSTAGRAM Instagram
#> 4 GOOGLE_PLUS Google Plus
#> 5 LINKEDIN LinkedIn
#> 6 TUMBLR Tumblr
#> 7 VK VK
#> 8 YOUTUBE YouTube
#> 9 TELEGRAM Telegram
Mentions by themselves are not always interesting. Mentions, as handled by BrandsEye, relate towards something that we name a brand. This is not necessarily a brand in the strict sense of the word. It certainly could be, but in this case, it’s the entity or concept or thing that the mention is related towards.
It is also, importantly, the thing that we measure sentiment towards.
Sentiment doesn’t exist in a vacuum, it is usually related to something.
media is ENTERPRISE
We currently only have three main categories at the moment.
ID | Meaning |
---|---|
ENTERPRISE | From the brand itself |
PRESS | From a media source |
CONSUMER | From everyone else |
But you can list all the ones we recognise using data_model_categories()
.
process is verified
crowdVerified is true
What’s the difference, and when to use which.
We use two letter SO 3166-1 Alpha 2 country codes to represent the countries themselves. You can list them using data_model_countries()
data_model_countries()
#> # A tibble: 252 x 2
#> id name
#> <chr> <chr>
#> 1 AF Afghanistan
#> 2 AX Aland Islands
#> 3 AL Albania
#> 4 DZ Algeria
#> 5 AS American Samoa
#> 6 AD Andorra
#> 7 AO Angola
#> 8 AI Anguilla
#> 9 AQ Antarctica
#> 10 AG Antigua and Barbuda
#> # ... with 242 more rows
Analyse is required to get all of these names.
Now that we have more statements, we can give more examples.
We use 2-letter ISO 639-1 codes to represent languages. You can list all the ones we understand using data_model_languages()
:
data_model_languages()
#> # A tibble: 149 x 2
#> id name
#> <chr> <chr>
#> 1 ab Abkhazian
#> 2 aa Afar
#> 3 af Afrikaans
#> 4 sq Albanian
#> 5 am Amharic
#> 6 ar Arabic
#> 7 an Aragonese
#> 8 hy Armenian
#> 9 as Assamese
#> 10 ay Aymara
#> # ... with 139 more rows
dplyr is an amazing library for manipulating data. After downloading data using brandseyer2, it allows you to reshape the data to fit your needs.↩︎
ggplot2 is a rich visualisation library for R. After pulling data from brandseyer2, ggplot2 is a great tool for visualising the data.↩︎
readr is a library for reading and writing CSV data. R has a rich ecosystem for importing and exporting data, with other libraries providing support for Excel and Google Sheets.↩︎