Basic Sorting & Filtering within Spreadsheets
This tutorial aims to introduce basic sorting and filtering techniques using commonly available spreadsheet applications. Such techniques are important to master when working with large volumes of data such as those found on this website.
This tutorial requires a spreadsheet application, web browser and text editor to be installed, most people will already have such software on their PC.
Whilst this tutorial makes use of LibreOffice Calc, the same principals will apply to any other mainstream spreadsheet application including Microsoft Excel. LibreOffice Calc is a good choice of free and open source spreadsheet application should you not want to purchase Microsoft Excel and can be downloaded here: (https://www.libreoffice.org/).
Obtaining Test Data
To begin this tutorial we will need to download any of the season results data files from within the Pro-Data members section, navigate to the Football Data (Leagues) page and choose a league to begin (as is shown below).
I have chosen to work with the English Premier League but this tutorial will also work with any of the results files for any other leagues. Once you have reached a league’s individual download page click to download one of the results files (NB: make sure this is a results file and not an events file).
As can be seen in the image above, I have chosen to work with the Premier League 2019/2020 results file, if you are using LibreOffice Calc (like me) you will see two notifications when opening the file, the first is a preview box which shows the data within the CSV file, simply click OK if you see this:
Also, when the file does open it will (by default) be in read only mode, simply click the “Edit Document” button on the blue bar atop the page to allow editing of the file:
At this point we have our file ready to work on, we can now save it to our PC, to do this go to File > Save As and save the file as a xlsx file (NB the Save as Type box is set to xlsx):
At this point we will be ready to work with the data within this file.
How our Data is Stored (Data Types)
Before getting any further into the sorting and filtering of data it is really important we understand how our data is stored within this file, in other words, to realise our data can be stored as multiple different types including text, integers (whole numbers), decimals and boolean values amongst other things.
As humans we can easily understand the difference between a string of text reading “123” and the number 123, but when working with data it is sometimes important to specify to the computer what the data is suppose to be. This is especially important when sorting and filtering data as is done in the remainder of this tutorial.
For example, if we look in our recently opened spreadsheet at the FTHG column we can see, for each fixture, the number of home goals which have been scored. At this point in time the computer will see this value as a text value, we need to change this to be a numeric value going forwards as we will sort and filter on this column in the next section.
To begin, first click in the column letter above FTHG, this should be the letter “L” if you are following along with a standard data sheet from the website:
Once highlighted right-click on any of the highlighted cells and then select Format Cells:
Once the Format Cells box opens up click on Number in the category list and then -12345 in the Format list (since we cant have anything other than whole numbers when looking at goals we don’t need to worry about the decimal points here). You choice should look like the below:
Then finally click OK to apply the settings.
Basic Sorting of Data
If you look at any of the spreadsheets available with the Pro-Data service you will notice they are already pre-sorted in date ascending order, in other words, the oldest games start at the top and are then recorded in date order with the final games towards the bottom.
This ordering is fine for those wanting a quick glance at results, there might be times, however when we need to sort our data in other ways. For example, we might want to games with the most home goals scored with the highest scorers at the top, let’s start by putting this scenario into practice.
To begin with we must select all occupied cells within the spreadsheet, this is necessary because all cells are linked to each other on a line-by-line basis. For example, on any given line the DATE field is the date for all other columns on that same line, as is the number of home goals scored the same for any other aspect (column) of the same game.
The easiest way to select all cells is to click on the top right corner in-between the A column and row number 1 as is highlighted below:
Once all cell are highlighted we must then go the Data menu and then click on Sort:
At this point we will have open the Sorting box, from here we can perform our sorting operations on the pre-selected data.
Before going any further there is one important thing to be aware of, this is the column heading found within this particular file (and all other football data files found on this site). Essentially, data files like these sometimes have column headings and sometimes they don’t, in cases where we do have column headings (such as when using Pro-Data files) we must tell the sorting tool that we have these headings. To do this first navigate to the Options tab and then select the box titled “Range contains column labels” as is shown below:
Once this box is checked we can go back to the Sort Criteria tab and select a column heading to sort with, under Sort Key 1 choose the FTHG column and also tick for Descending order as is shown below:
We will now have all of our games sorted by the number of home goals with the highest number at the top:
Whilst looking at this data note how under the DATE heading it now shows the games are no longer in date descending order as they were before, this shows that the whole set of data has been sorted and not just the FTHG column itself (which is what we want).
Basic Filtering of Data
Now we have performed some basic sorting on our data (we now have games in order of the number of home goals scored) we can look at filtering the data set.
For example, we could ask to see only games with 2 or more home goals scored, to begin looking for this we must start by highlighting all cells again (like we did before) and then going to the Data > More Filters > Standard Filter menu like below:
Once in the Standard Filter box we can apply one or more conditions to our FTHG column (or any other column for that matter), start by selecting the FTHG column under the Field Name heading and then under the Condition heading choose the “more than” operator (>), like so:
The operator (>) means ‘more than’, so it will filter out anything which is more than the value we set, this value can be set in the final box labelled Value, when finished our box should look like the one below:
As can be seen above, my filter is specifying I filter out only games with more than 4 home goals scored (so, at least 5 home goals), this produces a much smaller data set to work with like so:
We can extend our filter even further by adding multiple clauses to the filtering condition, for example to show fixtures with more than 4 goals but less than 7 goals we could use the following:
As per the above image, notice how we now have two lines in the Standard Filter box, the addition of the AND clause (on the left of the second line) means both conditions must be true. This configuration will now filter out all games with more than 4 home goals and less than 8 home goals.
This tutorial has introduced the basic concepts of sorting and filtering data within a spreadsheet. These are two important functions which can be extended much further than is shown here and these basics will be expanded throughout other tutorials within the Pro-Data members website.
It is important you take notice of the first section on data types, this will become a key consideration when working with numeric data going forwards!
Where to Next
By now you should feel comfortable finding data files on the Pro-Data members site and working with the results. Next, I would suggest looking at some more advanced spreadsheet related tutorials or moving on to begin building the Pro-Data SQL Database (click here to start the 3 part tutorial).