Getting Started with SQL Server (Pt. 2 – Building our First Database)

Table of Contents
    Add a header to begin generating the table of contents

    Introduction

    This is the second tutorial in a series of three tutorials aiming to give students a basic working knowledge of Microsoft SQL Server (relating to working with football data). This second tutorial builds upon the first so you must have completed this prior to starting this one!

    In this tutorial we will be looking at first obtaining and preparing some football data from the Pro-Data results service, importing it into our newly installed SQL Server using the automated importing tools and then finally verifying everything has worked as it should.

    At the end of this tutorial you will have your first, basic, football stats database in SQL Server available, this will then be built upon further in the third tutorial where we look at some more advanced data retrieval methods using SQL.

    Prerequisites

    This tutorial requires that the first tutorial in this series be completed first (Getting Started with SQL Server (Pt. 1 – Downloading & Installing)). It is inadvisable to attempt this tutorial without first fully completing this first tutorial beforehand!

    Investigating possible data sources

    Before getting back into SQL Server and building our first database we first need to look at obtaining some football data to fill this database with (databases aren’t much use when they are empty!). As such, the first section of this tutorial will take a quick look at obtaining and preparing data available with the Pro-Data subscription.

    Go out onto the web and you will find there are many places offering football stats, many of which cater to the casual sports bettor and feature fancy graphs and graphics to make them seem more useful than they might actually be. Data such as this is readily available on the web and often free but unfortunately is of little use when building our stats database.

    The problem with most of the football stats you find on the web is they are pre-processed, in other words it is not the raw match data you are looking at but instead the result of someone else’s analysis on such data. So, to develop our football database in SQL Server we actually require the raw, underling football data (i.e. match results) and we need it in a computer readable format (such as a CSV file from the Pro-Data service, as is shown below):

    pro-data csv example in notepad

    Of course, there are lots of places which will provide computer readable sports data feeds however, many of these charge a fee to access them. Some more basic feeds can be very reasonable in cost whilst others which provide vast amounts of statistics on games including live stats can be very expensive!

    Fortunately, thanks to a full membership of FootballDataPro.com we can make use of the detailed selection of results and stats data found within the Football Data (Leagues) pages themselves.

    Whilst we will be using our own FootballDataPro.com data files for this tutorial it is really important to say at this stage that the skills we are developing here will eventually be transferable for use with virtually any data source. This means if you decide upon a new source of data in the future the skills you have developed during this tutorial will allow you to work with this new data source and import it into your stats database.

    Downloading our data files

    Before we start downloading our files it is a good idea to create a new folder for our football database project and keep all of the project resources together. I have called my folder “football-stats-db” and will keep everything relating to my database (and this tutorial) within it:

    football stats db folder

    Additionally, inside of this new folder I have created a second folder called “data-files”, this will be used to keep the raw data files downloaded during the building of the database.

    data files folder

    Next, we need to head on over to the Football Data (Leagues) section of the Pro-Data members site and obtain some data files to import, I will be using the 2019 / 2020 season of the English Premier League for the purposes of this tutorial.

    pro-data epl league page

    Once we arrive at the Premier League section we will see a list of links, each one representing a full season of scores and match statistics. To get started locate the 2019 / 2020 season and click on the link to download the file:

    pro-data epl league page csv list

    Your browser will now ask you to open or save the first data file, for now let’s save it in the “data-files” folder we created earlier on:

    pro-data downloading epl 19-20 csv file

    Next, once the file has downloaded we can open it up and have a look at the data within. As it is a CSV file It will open up in LibreOffice Calc by default and this is fine for now as we just want a quick look at the format. When LibreOffice Calc starts up we will see a screen asking us to import the CSV, don’t worry about this for now and just click OK:

    pro-data epl 19-20 libreoffice import screen

    Once we have the file open you will notice there is quite a lot going on and it can look a bit overwhelming at first, so let’s go through this file step by step:

    The Column Headers

    The first row in the spreadsheet is that of the column headers, each piece of match information including the date, the home and away team names, number of home goals etc. are each given a column name in the file:

    pro-data league csv headings highlighted

    You will probably have noticed there are a lot of columns, some of these are obvious as to what information they contain and others aren’t. To download the “key-descriptions” file which contains a description of each column go to the bottom of any league page where it can be downloaded or the dedicated Key Descriptions page for use of the online version.

    pro-data key decriptions file

    Click on the link to download the key descriptions file, save it in the “football-stats-db” folder and then open it up to see the notes:

    pro-data key descriptions file contents

    As can be seen in the image above, the key descriptions file lists each column name along with a description as to what that column actually contains. For example, in the image above we see the column named “FTHG” translates to “Full Time Home Team Goals”. It is useful to keep this file to hand for future reference!

    The Rows

    pro-data csv rows highlighted

    All of the rows below the first row (column headings) contain information for each game played in this particular league and season. Each row contains an individual game and is a unique record in itself!

    Importing Data into our Database

    Now we have our first data file downloaded we are ready to create our database and import this CSV file into it. The process we will be using here will apply the same way to any of the data files found on FDP since they all share the same data structure and format.

    Step 1:

    First we need to open up SQL Server Management Studio and if necessary start the database engine service as is described in the first tutorial (for those who elected not to have it start up automatically!).

    Step 2:

    Once in the management studio navigate to the “Object Explorer” toolbar on the left, right click on “databases” and then click on “New Database” like below:

    ssms right-click create database

    Step 3:

    Next, a new box will open up asking for some more information on the new database, start by giving the database a name of “football-stats-db” as is shown below:

    ssms new database dialouge box

    There are some other parameters which can be set here such as the initial size of the database and the size of the log files, there settings however are beyond the scope of this tutorial and there is no need to worry about them for now (just leave them all on default values).

    Once you have typed in the database name click on “ok” and the database will then be created. We can see that the database exists by looking under “databases” in the object

    explorer as is shown below (you might need to use the blue refresh arrow first as is pointed out):

    Step 4:

    Now we can see our database under the object explorer we are ready to import our first data file into it using the import tool. Right click on the “football-stats-db” database and under the “Tasks” sub menu select “Import Data”:

    ssms tasks import data menu

    The built-in SQL Server import tool will now open up, click on “next” to continue:

    ssms import export wizard

    Step 5:

    On the next screen we are asked to choose a data source, from the drop-down box select “Flat File Source”:

    ssms import export choose data source

    The screen will now update to allow us to specify the file we are importing, click on “Browse” and select the “england-premier-league-2019-2020.csv” file which we downloaded earlier on (NB – you might need to specify “CSV Files” in the drop-down box bottom right as it filters out only text files by default, this is highlighted below):

    ssms import selecting csv file

    After this click on “next” to continue.

    Step 6:

    On the next screen we will see a preview of the file we are importing, the import tool will try to establish the format of the file itself and show us a preview of what it comes up with:

    ssms import preview of source file

    Remembering back to the data file we downloaded earlier, the first row of columns formed the column headings and not the data itself. Looking at the image above we can see SQL Server has successfully detected these column headings by itself, this is great and means no additional settings are needed to pick them up! Underneath the headings we can see the data rows themselves all look to be formatted correctly so we can now click on “next” and continue the import process.

    Step 7:

    On the next screen we are asked to choose a destination for our import, this destination is of course our “football-stats-db” database. From the “Destination” drop down box select “SQL Server native Client 11.0” as is shown below:

    ssms import choose destination sql client

    Once we make this selection the window will update and allow us to specify a server name and database. Looking at the image below we must populate the “Server Name” field as well as the “Database” field with that of our server name and database name.

    Note that since we are using our local PC as our server that this will be filled in by default with the local server name for us (your server name will be different to mine!). Since we only have the one database this will also be filled in by default but if you do have more databases on your server in the future be careful to set this field accordingly:

    ssms import choose destination database

    After checking these both look ok, click on “next” to continue.

    Step 8:

    Next, we will see the “Select Source Tables and Views” screen. Since we are importing our data from a flat file and have already checked the formatting looks ok this screen does not need our attention, simply leave it as it is and click “next”:

    ssms import select source table and views

    Step 9:

    Finally, we will see a screen called “Save and Run Package”, we don’t need to worry about the SSIS package for now so simply click “Finish”:

    ssms import save and run package

    We will then see the final summary screen before the job runs, click “Finish” to start the import process running:

    ssms import complete wizard

     

    Step 10:

    The import will only take a few seconds to complete since we have a relatively small amount of data, once this is complete we will see a confirmation screen like so:

    ssms import wizard final screen

    Notice in the image above we see a green tick to indicate success and also see the total number of rows imported in the bottom right. The number of rows will vary depending on the league and season you are importing.

    Testing the Import Worked Correctly

    Now we have completed our first import we need to check everything has imported correctly, yes the import tool said it had imported successfully but we should always check this ourselves in case of any unforeseen issues which might have cropped up.

    Note: Whenever working with data try and get into the habit of always double checking things have worked as they should! As data is processed and re-used over time any small issues with the data will always be present going forwards and the data itself will then be wrong forever if the issue isn’t picked up early on!

    Step 1:

    First we need to go back to the object explorer in the SQL Server Management Studio and navigate to our database. In the file tree underneath our database expand the “Tables” folder and we should then see a new table with the same name as our CSV data file (england-premier-league-2019-2020) like below:

    ssms new table in object explorer

    Step 2:

    Right click on the “dbo.england-premier-league-2018-2019” table and select the “Select Top 1000 Rows” option as is shown below:

    ssms select top 100 rows from table

    A new tab will now open in the main part of the management studio showing us the top 1000 rows of data from the table. Notice in the image below how the screen is split horizontally with the data at the bottom and the SQL script used to generate this data at the top:

    ssms select top sql result

    Don’t worry too much about the SQL part right now, we will look at how SQL works in more depth in the next part of this tutorial series. The data section at the bottom is what we need to look at now to check that the data has been imported into our database correctly. If your data looks like that in the image above (and similar to how it looked in the spreadsheet) then we have a successful import!

    Where to Next?

    As was mentioned earlier this is the first in a three part series of tutorials for getting started using SQL Server. After completing this tutorials you can move on to the third tutorial (Advanced Events) and begin learning some SQL programming to start querying the database

    If you have experienced any problems whilst following this tutorial please get in touch via the support form.

    Scroll to Top