Introduction to

title

with Application to Bioinformatics

- Day 1

Who we are

Uppsala

Nina Malin Verena Dimitrios Johan Martin
Drawing Drawing Drawing Drawing Drawing

Umeå

Jeanette Allison Matus Pedro
Drawing Drawing Drawing Drawing

Who you are

Drawing

Schedule

Drawing

Check

  • Has everyone managed to install Python?
  • Have you managed to run the test script?
  • Have you installed notebooks? (optional)

What is programming?

Wikipedia:

"Computer programming is the process of building and designing an executable computer program for accomplishing a specific computing task"

What can we use it for?

Endless possibilities!

  • reverse complement DNA
  • custom filtering of VCF files
  • plotting of results
  • all excel stuff!

Why Python?

Typical workflow

  1. Get data
  2. Clean, transform data in spreadsheet
  3. Copy-paste, copy-paste, copy-paste
  4. Run analysis & export results
  5. Realise the columns were not sorted correctly
  6. Go back to step 2, Repeat

Drawing

Python versions

  • Python 1.0 - January 1994
  • Python 1.2 - April 10, 1995
  • Python 1.3 - October 12, 1995
  • Python 1.4 - October 25, 1996
  • Python 1.5 - December 31, 1997
  • Python 1.6 - September 5, 2000
  • Python 2.0 - October 16, 2000
  • Python 2.1 - April 17, 2001
  • Python 2.2 - December 21, 2001
  • Python 2.3 - July 29, 2003
  • Python 2.4 - November 30, 2004
  • Python 2.5 - September 19, 2006
  • Python 2.6 - October 1, 2008
  • Python 2.7 - July 3, 2010
  • Python 3.0 - December 3, 2008
  • Python 3.1 - June 27, 2009
  • Python 3.2 - February 20, 2011
  • Python 3.3 - September 29, 2012
  • Python 3.4 - March 16, 2014
  • Python 3.5 - September 13, 2015
  • Python 3.6 - December 23, 2016
  • Python 3.7 - June 27, 2018

Drawing

Some good advice

  • 5 days to learn Python is not much
  • Amount of information will decrease over days
  • Complexity of tasks will increase over days
  • Read the error messages!
  • Save all your code

How to seek help:

  • Google
  • Ask your neighbour
  • Ask an assistant

Drawing

Example of a simple Python script

In [ ]:
# A simple loop that adds 2 to a number
i = 0
while i < 10:
    u = i + 2
    print('u is',u)
    i += 1

Example of a simple Python script

Drawing

Comment

All lines starting with # is interpreted by python as a comment and are not executed. Comments are important for documenting code and considered good practise when doing all types of programming

Example of a simple Python script

Drawing

Literals

All literals have a type:

  • Strings (str)       ‘Hello’ “Hi”
  • Integers (int)     5
  • Floats (float)     3.14
  • Boolean (bool)     True or False

Literals define values

In [ ]:
'this is a string'
"this is also a string"
3       # here we can put a comment so we know that this is an integer
3.14    # this is a float
True    # this is a boolean

Collections

In [ ]:
[3, 5, 7, 4, 99]       # this is a list of integers

('a', 'b', 'c', 'd')   # this is a tuple of strings
{'a', 'b', 'c'}        # this is a set of strings
{'a':3, 'b':5, 'c':7}  # this is a dictionary with strings as keys and integers as values

What operations can we do with different values?

That depends on their type:

In [ ]:
'a string'+' another string'

Type         Operations

int           + - / ** % // ...
float           + -
/ * % // ...
string           +

Example of a simple Python script

Drawing

Identifiers

Identifiers are used to identify a program element in the code.

For example:

  • Variables
  • Functions
  • Modules
  • Classes

Variables

Used to store values and to assign them a name.

Examples:

  • i = 0
  • counter = 5
  • snpname = 'rs2315487'
  • snplist = ['rs21354', 'rs214569']
In [ ]:
width  = 23564
height = 10

snpname = 'rs56483'
snplist = ['rs12345','rs458782']

How to correctly name a variable

Drawing

Allowed:                       Not allowed:
Var_name                       2save
_total                           *important
aReallyLongName                 Special%
with_digit_2                       With   spaces
dkfsjdsklut   (well, allowed, but NOT recommended)

NO special characters:
+ - * $ % ; : , ? ! { } ( ) < > “ ‘ | \ @

Reserved keywords

Drawing

These words can not be used as variable names

Summary

  • Comment your code!
  • Literals define values and can have different types (strings, integers, floats, boolean)
  • Values can be collected in lists, tuples, sets, and dictionaries
  • The operation that can be performed on a certain value depends on the type
  • Variables are identified by a name and are used to store a value or collections of values
  • Name your variables using descriptive words without special characters and reserved keywords

→ Notebook Day_1_Exercise_1 (~30 minutes)

NOTE!

How to get help?

Python standard library

Drawing

Example print() and str()

Drawing

Note!
Here we format everything to a string before printing it

Python standard library

Drawing

In [ ]:
width  = 5
height = 3.6
snps   = ['rs123', 'rs5487']
snp    = 'rs2546'
active = True
nums   = [2,4,6,8,4,5,2]

sum(nums)

More on operations

Drawing

In [ ]:
x = 4
y = 3
z = [2, 3, 6, 3, 9, 23]

max(z)

Comparison operators

Drawing

Can be used on int, float, str, and bool. Outputs a boolean.

In [ ]:
x = 5
y = 3

#x = 5.14
#y = 3.14

y + 2 == x

Logical operators

Drawing

Membership operators

Drawing

In [ ]:
x = 2
y = 3

x == 2 and y == 5

#x = [2,4,7,3,5,9]
#y = ['a','b','c']

#23 in x
#4 in x and 'd' in y
In [ ]:
# A simple loop that adds 2 to a number and checks if the number is even
i    = 0
even = [2,4,6,8,10]
while i < 10:
    u = i + 2
    print('u is '+str(u)+'. Is this number even? '+str(u in even))
    i += 1
In [ ]:
# A simple loop that adds 2 to a number, check if number is even and below 5
i    = 0
even = [2,4,6,8,10]
while i < 10:
    u = i + 2
    print('u is '+str(u)+'. Is this number even and below 5? '+\
          str(u in even and u < 5))
    i += 1

Order of precedence

There is an order of precedence for all operators:

Drawing

Word of caution when using operators

In [ ]:
x = 5
y = 7
z = 2
(x > 6 and y == 7) or z > 1

#x > 6 and (y == 7 or z > 1)
#(x > 6 and y == 7) or z > 1

#x > 4 or y == 6 and z > 3
#x > 4 or (y == 6 and z > 3)
#(x > 4 or y == 6) and z > 3
In [ ]:
# BEWARE!
x = 5
y = 8

x > 2 or xx == 6 and xxx == 6
x > 42 or (y < 8 and someRandomVariable > 1000)

Python does short-circuit evaluation of operators

More on sequences (For example strings and lists)

Lists (and strings) are an ORDERED collection of elements where every element can be accessed through an index.

Drawing

In [ ]:
l = [2,3,4,5,3,7,5,9]
s = 'somelongrandomstring'

#s[0]
#s[0:4]
#s[0:4:2]
#s[0] = 'S'

Mutable vs Immutable objects



Mutable objects can be altered after creation, while immutable objects can't.

Immutable objects:       Mutable objects:

  • int               • list
  • float                • set
  • bool                • dict
  • str
  • tuple

Operations on mutable sequences

Drawing

In [ ]:
s = [0,1,2,3,4,5,6,7,8,9]
s.insert(5,10)

s

Summary

  • The python standard library has many built-in functions regularly used
  • Operators are used to carry out computations on different values
  • Three types of operators; comparison, logical, and membership
  • Order of precedence crucial!
  • Mutable object can be changed after creation while immutable objects cannot be changed





→ Notebook Day_1_Exercise_2 (~30 minutes)

Loops in Python

In [ ]:
fruits = ['apple','pear','banana','orange']

print(fruits[0])
print(fruits[1])
print(fruits[2])
print(fruits[3])
In [ ]:
fruits = ['apple','pear','banana','orange']

for fruit in fruits:
    print(fruit)

Always remember to INDENT your loops!

Different types of loops

For loop

In [ ]:
fruits = ['apple','pear','banana','orange']

for fruit in fruits:
    print(fruit)
print('end')

While loop

In [ ]:
fruits = ['apple','pear','banana','orange']

i = 0
while i < len(fruits):
    print(fruits[i])
    i = i + 1

Different types of loops

For loop

Is a control flow statement that performs a fixed operation over a known amount of steps.

While loop

Is a control flow statement that allows code to be executed repeatedly based on a given Boolean condition.



Which one to use?

For loops better for simple iterations over lists and other iterable objects

While loops are more flexible and can iterate an unspecified number of times

Example of a simple Python script



Drawing

→ Notebook Day_1_Exercise_3 (~20 minutes)

Conditional if/else  statements

Drawing

In [ ]:
shopping_list = ['bread', 'egg', 'butter', 'milk']

if len(shopping_list) > 2:
    print('Go shopping!')
else:
    print('Nah! I\'ll do it tomorrow!')
In [ ]:
shopping_list = ['bread', 'egg', 'butter', 'milk']
tired         = True

if len(shopping_list) > 2:
    if not tired:
        print('Go shopping!')
    else:
        print('Too tired, I\'ll do it later')
else:
    if not tired:
        print('Better get it over with today anyway')
    else:
        print('Nah! I\'ll do it tomorrow!')

This is an example of a nested conditional

Putting everything into a Python script

Any longer pieces of code that have been used and will be re-used SHOULD be saved

Two options:

  • Save it as a text file and make it executable
  • Save it as a notebook file



Examples

Things to remember when working with scripts

  • Put #!/usr/bin/env python3 in the beginning of the file
  • Make the file executable to run with ./script.py
  • Otherwise run script with python script.py

Working on files

In [ ]:
fruits = ['apple','pear','banana','orange']

for fruit in fruits:
    print(fruit)

Drawing

In [ ]:
fh = open('../files/fruits.txt', 'r', encoding = 'utf-8')

for line in fh:
    print(line.strip())

fh.close()

Pause for additional useful methods:



'string'.strip()       Removes whitespace
'string'.split()       Splits on whitespace into list

In [ ]:
s  = 'an example string to split with whitespace in end   '
sw = s.strip()
sw
#l  = sw.split()
#l
#l  = s.strip().split()
#l

Drawing

In [ ]:
fh = open('../files/fruits.txt', 'r', encoding = 'utf-8')

for line in fh:
    print(line.strip())

fh.close()

Another example

Drawing How much money is spent on ICA?

In [ ]:
fh    = open("../files/bank_statement.txt", "r", encoding = "utf-8")

total = 0

for line in fh:
    expenses = line.strip().split()  # split line into list
    store    = expenses[0]           # save what store
    price    = float(expenses[1])    # save the price
    if store == 'ICA':               # only count the price if store is ICA
        total = total + price
fh.close()

print('Total amount spent on ICA is: '+str(total))  

Slightly more complex...

Drawing

How much money is spent on ICA in September?

In [ ]:
fh    = open("../files/bank_statement_extended.txt", "r", encoding = "utf-8")

total = 0

for line in fh:
    if not line.startswith('store'):
        expenses = line.strip().split()
        store    = expenses[0]
        year     = expenses[1]
        month    = expenses[2]
        day      = expenses[3]
        price    = float(expenses[4])
        if store == 'ICA' and month == '09':   # store has to be ICA and month september
            total = total + price
fh.close()

out = open("../files/bank_statement_result.txt", "w", encoding = "utf-8")   # open a file for writing the results to
out.write('Total amount spent on ICA in september is: '+str(total))
out.close()

Summary

  • Python has two types of loops, For loops and While loops
  • Loops can be used on any iterable types and objects
  • If/Else statement are used when deciding actions depending on a condition that evaluates to a boolean
  • Several If/Else statements can be nested
  • Save code as notebook or text file to be run using python
  • The function open() can be used to read in text files
  • A text file is iterable, meaning it is possible to loop over the lines

→ Notebook Day_1_Exercise_4