• We probably need a vignette describing what a mention looks like. Although grouse documentation?

Introduction

This is an introduction

This will provide an introduction to filtering for data in BrandsEye

BrandsEye’s API supports a small language to help you select exactly the mentions that you want to select. You’ll do this with the help of the mentions() function, which always expects a filter. You can also count and aggregate mention data using count_mentions(). We also have a query language, that this will introduce a bit of.

Combined with dplyr1 for manipulating data, ggplot22 for visualising data, and readr3 for exporting data, brandseyer2 is a powerful tool for examining social media data.

Filtering is selecting individual mentions to make up a data set, either to be returned directly using the mentions() function, or for counting and aggregating using count_mentions().

library(brandseyer2)

account("TEST01AA") %>% 
  mentions(filter = "published inthelast week")  
#> # A tibble: 1 x 7
#>   id    published           brands   sentiment mediaLinks   socialNetwork tags  
#>   <chr> <dttm>              <list>       <int> <list>       <chr>         <list>
#> 1 2-1   2018-03-03 21:37:35 <int [2…         0 <tibble [1 … TWITTER       <int …

Filtering

Dates

All mentions are published at a particular date and time. This is the most important thing that you can filter on: the publication date. Most questions that we want to answer usually have a constrained time period. For example,

what was the sentiment towards our brand last month?

or

how did sentiment change towards our brand over Christmas last year compared to Christmas the year before?

The published field allows us to filter on mentions by their date of publication. For example,

published after ‘2018/01/01’

will filter for mentions that are published on or after the 1st of January, 2018. Note that mentions published on the first are also included by the filter.

published before ‘2018/02/01’

filters for mentions published strictly before the 1st of February, 2018. It does not include mentions from the 1st of February.

These two filters can be combined using the word and, like so:

published after ‘2018/01/01’ and published before ‘2018/02/01’

Some things to notice in this filter: it’s made up of two sub-filters, the after bit, and the before bit. These have been combined using an and. Dates themselves are quoted in single-quotes, like so: ‘2018/01/01’.

Only mentions for which are published both after 2018/01/01 and before 2018/02/02, will be selected by this filter. In other words, this filter selects mentions published during the month of January.

We will have more to say about the connector and in a later section of this guide.

You can test this out now, using either analyse or brandseyer2.

Trying this out

In analyse

The brand selector.

Using brandseyer2

All of our examples, going forward, will assume that you have access to your accounts, and that you’re using brandseyer2. We won’t do anything fancy, just the bare minimum brandseyer2 to demonstrate the data.

mentions <- account("QUIR01BA") %>% 
  filter_mentions("published after '2018/01/01' and published before '2018/02/01'" ) %>% 
  count_mentions
  
mentions
#> # A tibble: 1 x 1
#>   mentionCount
#>          <int>
#> 1        36592

So, we have 36 592 mentions published in the month of January. Instead of counting the mentions with count_mentions(), you could instead use mentions() to pull a table of all of these mentions.

Relative dates

The filter language provides some convenient shortcuts. A common one is to find mentions from the last week

published inthelast week

Notice that inthelast is written as one word.

Similarly, we can find mentions

  • published inthelast hour
  • published inthelast 24hours
  • published inthelast day
  • published inthelast week
  • published inthelast month
  • published inthelast quarter
  • published inthelast year

Dates first, since this allows people to constrain data, rather than get everything. How do they cancel if they’re accidently pulling more than they want to?

  • Published date is the important one
  • How to filter on hours.
  • Addendum: picked up date, updated date, what are these.

Grouping by date

  • hour
  • day
  • week
  • month
  • year

Selecting mentions

  • By ID
  • by conversation ID

Relevancy

Relevancy isnt irrelevant

Filter Patterns

Basic structure of a statement

There are three parts to a filter: a field, an operator, and a value. The field is a part of a mention. The operator looks different depending on whether it’s a number or text.

Joining together statements

  • What is AND? Both things must be true.
  • What is OR? Either things must be true.
  • It’s not English! There’s a distinct meaning.
  • This is called Boolean Logic, or Boolean Algebra4, after George Boole, a mathematician who published in the 1800s on logic and “rules of thought”.

More filtering

Where are mentions from?

Just give me things from twitter

socialNetwork is TWITTER

Social network codes can be obtained using data_model_networks()

Currently, these are

data_model_networks()
#> # A tibble: 9 x 2
#>   id          name       
#>   <chr>       <chr>      
#> 1 TWITTER     Twitter    
#> 2 FACEBOOK    Facebook   
#> 3 INSTAGRAM   Instagram  
#> 4 GOOGLE_PLUS Google Plus
#> 5 LINKEDIN    LinkedIn   
#> 6 TUMBLR      Tumblr     
#> 7 VK          VK         
#> 8 YOUTUBE     YouTube    
#> 9 TELEGRAM    Telegram

Sentiment

Mentions by themselves are not always interesting. Mentions, as handled by BrandsEye, relate towards something that we name a brand. This is not necessarily a brand in the strict sense of the word. It certainly could be, but in this case, it’s the entity or concept or thing that the mention is related towards.

It is also, importantly, the thing that we measure sentiment towards.

Sentiment doesn’t exist in a vacuum, it is usually related to something.

Are things from the brand itself, consumers, or the press?

media is ENTERPRISE

We currently only have three main categories at the moment.

ID Meaning
ENTERPRISE From the brand itself
PRESS From a media source
CONSUMER From everyone else

But you can list all the ones we recognise using data_model_categories().

Tags

We introduce tags early, since topics are just tags.

Topics

Data verification

process is verified

crowdVerified is true

What’s the difference, and when to use which.

Things that should always be in filters

  • brands
  • topic trees?

Location

  • Filtering by continents and large geographic regions.
  • Filtering by country
  • Filtering by country region / province / state.
  • Filtering by city

We use two letter SO 3166-1 Alpha 2 country codes to represent the countries themselves. You can list them using data_model_countries()

data_model_countries()
#> # A tibble: 252 x 2
#>    id    name               
#>    <chr> <chr>              
#>  1 AF    Afghanistan        
#>  2 AX    Aland Islands      
#>  3 AL    Albania            
#>  4 DZ    Algeria            
#>  5 AS    American Samoa     
#>  6 AD    Andorra            
#>  7 AO    Angola             
#>  8 AI    Anguilla           
#>  9 AQ    Antarctica         
#> 10 AG    Antigua and Barbuda
#> # ... with 242 more rows

Analyse is required to get all of these names.

Language

Gender

Joining together statements revisted

Now that we have more statements, we can give more examples.

  • Precedence.
  • Bracketing.

We use 2-letter ISO 639-1 codes to represent languages. You can list all the ones we understand using data_model_languages():

data_model_languages()
#> # A tibble: 149 x 2
#>    id    name     
#>    <chr> <chr>    
#>  1 ab    Abkhazian
#>  2 aa    Afar     
#>  3 af    Afrikaans
#>  4 sq    Albanian 
#>  5 am    Amharic  
#>  6 ar    Arabic   
#>  7 an    Aragonese
#>  8 hy    Armenian 
#>  9 as    Assamese 
#> 10 ay    Aymara   
#> # ... with 139 more rows

Author information

Finding replies and reshares

Handling conversations

While we’ve so far dealt with filtering individual mentions, we can also filter whole conversations.


  1. dplyr is an amazing library for manipulating data. After downloading data using brandseyer2, it allows you to reshape the data to fit your needs.↩︎

  2. ggplot2 is a rich visualisation library for R. After pulling data from brandseyer2, ggplot2 is a great tool for visualising the data.↩︎

  3. readr is a library for reading and writing CSV data. R has a rich ecosystem for importing and exporting data, with other libraries providing support for Excel and Google Sheets.↩︎

  4. You can ead more about Boolean Logic on Wikipedia.↩︎