Working with Events in SQL Server (Pt. 1 – Theory & Set-up)
In this tutorial we will start to look at the concept of “events” relating to football fixtures and how they can be analysed using SQL Server in a practical context. As well as helping readers to understand Events, this tutorial also covers some basic database theory and the concept of entities and their relationships to each other (which is important when linking Fixtures to Events later on).
This is the first tutorial in a series of three and will focus more on the theory needed to understand and work with the relationships we find when working with fixtures and events. The second tutorial will cover the basics of building up some event based queries using SQL and the third will look at how more advanced SQL can be built up to answer complex, time-based questions.
The three part tutorial series Getting Started with SQL Server should be completed before attempting this tutorial. It is important this series is completed prior as the concepts discussed there will built upon further in this series.
Before getting into the more serious theory it is important to make clear what an event is, since we are looking at football data on this website we will primarily be talking about events relating to football matches themselves.
Firstly, think about all of the “things” which could happen during a regular football match:
- Red / Yellow cards
All of the above could be classified as “events”, this is because they happen during a fixture (and it is important we notice that they happen during the game).
But be careful…..
There will be other “attributes” associated with our football fixture which are not events, things such as:
- Full-time score
- Number of corners
- Number of shots on-goal
All of these are associated with the same fixture, but they are not events, this is because they do not happen during the fixture (rather, they are something which is associated with the fixture as a whole, or in the case of the full-time score, a way of summarising of all goal events).
For example, if I told you the full time score of a game was 1-0, I would be giving you an “attribute” belonging to that fixture (this being the “full time score” attribute). This attribute will never change and once the game has finished the score will always be 1-0.
This information above might be very useful, but crucially it will not tell us when this goal was scored (nor who scored it or if it was in the first or second half etc.), instead it is simply a result and nothing more. If I was instead to say the home team scored the only goal in the 90th minute this would give us the same information (i.e. a 1-0 win) but with more depth as we would know that because of being scored in the 90th minute this was probably a slow fixture to watch!
Looking at the Theory behind Events
So far we have established that an event is something which happens during a fixture and (crucially) has a timestamp associated with it (e.g. the only goal scored in the 90th minute). Before setting up our database and working with events hands-on, lets take a brief look at the theory behind how we will be working with such events.
This theory is basically related to general database theory itself, something I don’t want to go into in-depth right here (there is no need) however, it is important we understand at least one core concept – entities and their relationships to each other.
What are Entities?
When talking about database theory an “entity” is basically a “thing” (something in which exists either physically or virtually and we can store data about in a database). It is very difficult to conceptualise such entities when you first come across them (in this database theory context) and as such I will throw out a few football data related examples.
Let’s start by thinking about a “fixture”, a fixture is an entity (even though it doesn’t physically exist) and we can can store data (attributes) about it it in a database table. For example, a fixture will have a date, kick-off time, home team, away team, full-time score, and so on. All of these “attributes” belong to this fixture and can be stored in our database, this, of course, also applies equally to every other fixture we hold data upon.
Next, let’s look at a “team”, something else which could be considered an entity and something else which we can store data upon based on its attributes (values such as team name, home town, year established etc.)
And finally (in this explanation at least) we can look at “player” which is also an entity in this context. Players can be considered entities as they also have attributes associated with them such as name, DOB, height, position played etc.
All of the above (Fixtures, Teams and Players) are examples of entities, they are, of course, many other entities relating to a football match (think about “fans”, “Officials” (referees), “Leagues”, “Seasons” etc.) but the underlying concept is always the same, they are basically “things” for which we can store data on in our database.
In the next section we will start to look at how these entities are all related to each other in more detail and then a little later on bring this example back to Fixtures and Events, these being the two main entities we will be concentrating on later in this tutorial.
What are Relationships?
Now we know what an entity is we can start looking at how these entities are related to each other, for example, any given fixture entity will be made up of exactly 2x team entities (i.e. a fixture comprises of two teams playing each other). We can then go on to say that a team (entity) will be made up of multiple players (i.e. player entities).
So, as per the above statement, a fixture is related to a team in that the team entity is part of multiple entities used to make up a Fixture (a “fixture” has to teams). We can also say that a player entity is related to a team entity in that the player might play for said team (and by extension the player entity is therefore indirectly related to the fixture entity as well).
This might sound a bit too obvious so far, of course a fixture cant go ahead without teams but it is the relationships themselves we are looking at. For example, we wouldn’t store all of the team information in the fixtures table itself, we would instead just store the team id for each team and then look up further team info (e.g. the year they were founded) as and when we need to.
Bringing it back to Fixtures and Events
I appreciate this concept might be a little bit confusing at this point, so let’s bring this right back to topic and start looking specifically at Fixtures and Events again (both, of course examples of entities as we have just been discussing).
In the image below we have two Pro-Data data sheets, one is the results sheet for the English Premier League 19/20 (which represents a collection of fixtures) and the second spreadsheet (lower down in the same image) contains all events for the same league and season (i.e. the collection of events).
NB – “Fixture” and “Event” are both entities and related to each other in that any given Fixture will have multiple Events associated with it!
As you can see in the top spreadsheet (above) I have highlighted a single fixture (Liverpool vs Norwich City) and then in the lower events sheet highlighted all of the events associated with this fixture (remember a fixture can have potentially many events associated with it).
Now here is the key part, when it comes to linking these two entities together, the “keys”!
A “key” (in database terminology) is basically a unique ID field (or a combination of multiple fields together) which uniquely identifies any given row in a table. Going back to our our results spreadsheet, notice that every fixture has assigned to it a unique id number (this is “FIX_ID”), this means if we know this unique reference number we also know the rest of the data on said Fixture!
The same thing applies to the events spreadsheet, event though we will have multiple events per fixture, notice that each event still has its own unique ID number which can be used to identify it from all other fixtures.
Next, if we stay looking at the events spreadsheet you might also notice that each event also has the ID for the fixture that it belongs to (this is FIX_ID again and in this case is referred to as a “foreign key”). With this foreign key we can now group together all events that belong to a specified fixture simply by knowing the fixture’s unique ID number!
NB – We will get into the SQL around how this works in the next part of this tutorial series, putting this concept into practice with SQL should hopefully help make this more clear.
So, to finish off this section if we (again) look at the image of the Fixtures and Events spreadsheets overlapping (just above) it should be clear how Fixtures and Events relate to each other.
Remember, this is just for one league and a single season in this example, we could make use of multiple leagues across multiple seasons providing we have access to all of the relevant spreadsheets with the correct data in them.
In the next section we will get our SQL Server database set-up and ready for working with events before looking at this theory (with some real examples) in more detail in the second part of this tutorial series.
Setting up SQL Server
At this point of the tutorial you should have a basic understanding of what Fixtures and Events actually are, how they are linked to each other and how this information is delivered using the Pro-Data service (via the above two spreadsheets, one for events and another for fixtures).
In this section we will be getting our SQL Server database ready for making use of this new Events data!
Why is SQL Server Needed for Events?
It is, of course, possible to work with events without the use of SQL Server (or any other database system for that matter), but doing so will be considerably more difficult and time consuming than simply making use of the power of such software.
This difficulty is party due to the fact that the number of links (i.e. relationships) between the two spreadsheets will quickly start to grow at a big multiple as you try and bring more and more fixtures into your workings (remember, each Fixture can have potentially dozens of events associated with it!).
Even if you are proficient with automation using tools such as Excel (and the use of macros) this will not really solve the problem of making such data easier to work with, instead (and much more likely) this will serve to make an already complicated concept even more convoluted and difficult to stay on top of (and potentially introduce more errors into the mix as well).
Setting-up Our Existing Database
Whilst working through the three part series on Getting Started with SQL Server we already built up a database and imported a results (fixtures) table into it. In the remainder of this tutorial we will make use of this existing database and build it up a little further so that we can start working with events in SQL.
Step 1: We should already have the results table for the English Premier League (19/20 season) as a table in this database (this was done as a part of the Getting Started with SQL Server tutorial which should be completed before attempting this tutorial).
Next, let’s get the events table for this same league and season into the database as well (you can find this under the EPL data page as is shown below):
Simply click on the link (above) and download the file to the data-files folder we created in the Getting started with SQL Server tutorial.
Step 2: Now we have our events file ready we can import the data within it into our database in the same way used for the results table which has already been imported. Again, this import process is covered fully in the Getting started with SQL Server tutorial should you need a refresher on how it was done.
Step 3: With the new Events table added to our database we need to check the data has imported correctly (this is good practice to get into the habit of doing). Firstly, in SQL Server Management Studio, right click on tables in the object explorer and then right click on the new Events table which was just imported and select “Select Top 100 Rows” from the menu to confirm the data has imported correctly (like below).
With this final step, providing the data has imported correctly as per the above image, we now have our database ready for the second part in this tutorial series!
Where to Next?
As was mentioned earlier, this is the first in a three part series of tutorials for understanding and working with events. After completing this tutorials you will be ready to move onto part two of the series (Building our First Database) where we start looking at the basic SQL code for working with fixtures and their related events.
Remember, if you have experienced any problems whilst following this tutorial please get in touch with me via the Pro-Data support form or feel free to leave a comment below this article.