Warning: This document is for an old version of IntroQG. The main version is master.

Exercise 2

Warning

Please note that we provide assignment feedback only for students enrolled in the course at the University of Helsinki.

Start your assignment

You can start working on your copy of Exercise 2 by accepting the GitHub Classroom assignment.

Exercise 2 is due by the start of lecture in week 3.

You can also take a look at the open course copy of Exercise 2 in the course GitHub repository (does not require logging in). Note that you should not try to make changes to this copy of the exercise, but rather only to the copy available via GitHub Classroom.

General hints for Exercise 2

Formatting numbers in Python

As you may have noticed, the numbers you add to your plot using the plt.text() command look kind of ugly with so many digits after the decimal place. You can make them look a bit nicer by rounding them when you call the plt.text() command. This is called formatted output in Python, and it is nice not only because it can make your text easier to read, but also because the formatting does not change the data values themselves, only their display. To help you in formatting your output for your plots, here are some examples of how the output formatting works.

In [1]: import numpy as np

In [2]: pi = np.pi

In [3]: print('The value of pi is', pi)
The value of pi is 3.141592653589793

In [4]: print('The rounded value of pi is {0:.2f}'.format(pi))
The rounded value of pi is 3.14

Huh? Perhaps a bit more information is needed beyond the simple example above. Formatting text can be done in Python for any data that is a character string. You indicate something you would like to format by enclosing it in curly brackets { } within the quotation marks for the character string. In the example above, the 0 indicates that this is the first value you would like to format (remember, Python indexing starts at 0). After the 0:, you give how you would like to format the variable. In this case, .2f indicates you want it to be displayed as a decimal (floating point) value with 2 digits after the decimal place. At the end of the string, after the second quotation mark, you list .format() with the list of variables you want formated inside the parentheses. With this in mind, let’s consider another example.

In [5]: print('The sine of pi is {0:.4f}, and the cosine of pi is {1:.1f}.'.format(np.sin(pi), np.cos(pi)))
The sine of pi is 0.0000, and the cosine of pi is -1.0.

In [6]: print('In scientific notation, the sine of pi is {0:.6e}. As an integer value it is {0:.0f}.'.format(np.sin(pi)))
In scientific notation, the sine of pi is 1.224647e-16. As an integer value it is 0.

Above, you see a few additional things about string formatting. First, you see that you can give more than one variable to format by changing the number to the left of the : within the curly brackets. In the second case, you see that you’re able to format the same variable different ways within a single character string. You may not often need to do this, but it is possible.

You can find much more about string formatting on the Python documentation site.

Hints for Problem 1

Calculating summations

There are several ways in which you can calculate a summation in Python, including using the sum() function. However, it is often better to use a for loop for summing values, particularly if you are calculating values and don’t want to store their results separately from the summation. Perhaps an example will make this clear:

In [7]: numbers = [1,2,3,4,5,6,7,8,9]

In [8]: squaredSum = 0       # Set sum equal to zero before starting for loop

In [9]: for i in range(len(numbers)):
   ...:     squaredSum = squaredSum + numbers[i]**2.0
   ...: 

In [10]: squaredSum
Out[10]: 285.0

In [11]: sum(numbers)         # Sum of the list values
Out[11]: 45

In [12]: sum(numbers)**2.0    # Square of the sum of the list values, not equal to squaredSum
Out[12]: 2025.0

The point here is that in order to calculate the sum of each value squared in a Python list, you need to calculate the square of each value in the list separately, then add those together. You could save the squared values in a separate list and use the sum() function, but in this case it may be more clear and logical to use a for loop. You may want to do something similar for calculating your chi squared values.

Shorthand Python notation for adding to a variable

It is extremely common to need to add to an existing variable in computer programs. Because of this, there is a shorthand notation in Python for just this kind of operation.

In [13]: number = 34

In [14]: number = number + 5

In [15]: print(number)
39

In [16]: number += 5

In [17]: print(number)
44

As you can see, number += 5 is exactly the same as number = number + 5, just written a bit more compactly. As you might imagine, there are similar shortcuts for subtracting (-=), multiplying (*=), and dividing (/=).

Hints for Problem 2

Returning more than one value

In your linregress() function you are asked to calculate the y-intercept A and slope B for the best-fit line to your temperature data. Ideally, this means you would have your function return more than just one value. This is no problem, but perhaps an example of the syntax would be helpful. Let’s have a look.

In [18]: def name_split(name):
   ....:     """Splits a full name into first and last names."""
   ....:     first = name.split()[0]
   ....:     last = name.split()[1]
   ....:     return first,last
   ....: 

In [19]: boatname = "Boaty McBoatface"

In [20]: firstname, lastname = name_split(boatname)

In [21]: print("The first name is "+firstname+" and the last name is "+lastname+".")
The first name is Boaty and the last name is McBoatface.

Plotting your regression lines

The plt.plot() function requires at least one pair of (x, y) values to be able to plot a line. With your regression lines you have calculated the y-intercept A and slope B, which can be used to plot a line as long as you have some range of values for x. If you use the range of years for the x values, you can then use your A and B values along with those x values in the equation of a line to be able to plot the line location for the age range in the plot.