Introduction to

title

with Application to Bioinformatics

- Day 3

Review Day 2

  • Give an example of a tuple
  • What is the difference between a tuple and a list?
  • How would you approach a complicated coding task?
  • What is the different syntax between a function and a method?
  • Calculate the average of the list [1,2,3.5,5,6.2] to one decimal
  • Take the list ['i','know','python'] as input and output the string 'I KNOW PYTHON'
  • What are the characteristics of a set?
  • Create a set containing the integers 1,2,3, and 4, add 3,4,5, and 6 to the set. How long is the set?

Tuples

Give an example of a tuple:

In [ ]:
myTuple = (1,2,3,'a','b',[4,5,6])
myTuple

What is the difference between a tuple and a list?
A tuple is immutable while a list is mutable

How to structure code

  • Decide on what output you want
  • What input files do you have?
  • How is the input structured, can you iterate over it?
  • Where is the information you need located?
  • Do you need to save a lot of information while iterating?
    • Lists are good for ordered data
    • Sets are good for non-duplicate single entry information
    • Dictionaries are good for a lot of structured information
  • When you have collected the data needed, decide on how to process it
  • Are you writing your results to a file?

Always start with writing pseudocode!

Functions and methods

What is the different syntax between a function and a method?
functionName()     <object>.methodName()

Calculate the average of the list [1,2,3.5,5,6.2] to one decimal

In [ ]:
myList = [1,2,3,5,6]
round(sum(myList)/len(myList),1)

Take the list ['i','know','python'] as input and output the string 'I KNOW PYTHON'

In [ ]:
' '.join(['i','know','python']).upper()

Sets

What are the characteristics of a set?
A set contains an unordered collection of unique and immutable objects

Create a set containing the integers 1,2,3, and 4, add 3,4,5, and 6 to the set. How long is the set?

In [ ]:
mySet = {1,2,3,4}
mySet.add(3)
mySet.add(4)
mySet.add(5)
mySet.add(6)
len(mySet)

IMDb

How to find the number of movies per genre?

Drawing

... Hm, starting to be difficult now...

New data type: dictionary

  • A dictionary is a mapping of unique keys to values
  • Dictionaries are mutable

Syntax:
a = {} (create empty dictionary)
d = {'key1':1, 'key2':2, 'key3':3}

In [ ]:
myDict = {'drama': 4,
          'thriller': 2,
          'romance': 5}
myDict

Operations on Dictionaries

Drawing

In [ ]:
myDict = {'drama': 4,
          'thriller': 2,
          'romance': 5}
len(myDict)
myDict['drama']
myDict['horror'] = 2
#myDict
#del myDict['horror']
#myDict
'drama' in myDict
myDict.keys()
myDict.items()
myDict.values()

Exercise

In [ ]:
myDict = {'drama': 182, 
          'war': 30, 
          'adventure': 55, 
          'comedy': 46, 
          'family': 24, 
          'animation': 17, 
          'biography': 25}
  • How many entries are there in this dictionary?
  • How do you find out how many movies are in the genre 'comedy'?
  • You're not interested in biographies, delete this entry
  • You are however interested in fantasy, add that we have 29 movies of the genre fantasy to the list
  • What genres are listed in this dictionary?
  • You remembered another comedy movie, increase the number of comedies by one
In [ ]:
 

Find the number of movies per genre

Drawing

Hint! If the genre is not already in the dictionary, you have to add it first

Answer

Drawing

In [ ]:
fh        = open('../downloads/250.imdb', 'r', encoding = 'utf-8')
genreDict = {}     # create empty dictionary

for line in fh:
    if not line.startswith('#'):
        cols  = line.strip().split('|')
        genre = cols[5].strip()
        glist = genre.split(',')
        for entry in glist:
            if not entry.lower() in genreDict: # check if genre is not in dictionary, add 1
                genreDict[entry.lower()] = 1
            else:
                genreDict[entry.lower()] += 1   # if genre is in dictionary, increase count with 1
fh.close()
print(genreDict)

What is the average length of the movies (hours and minutes) in each genre?

Drawing

Answer

Drawing

Tip!
Here you have to loop twice

In [ ]:
fh        = open('../downloads/250.imdb', 'r', encoding = 'utf-8')
genreDict = {}

for line in fh:
    if not line.startswith('#'):
        cols    = line.strip().split('|')
        genre   = cols[5].strip()
        glist   = genre.split(',')
        runtime = cols[3]      # length of movie in seconds
        for entry in glist:
            if not entry.lower() in genreDict:
                genreDict[entry.lower()] = [int(runtime)]   # add a list with the runtime
            else:
                genreDict[entry.lower()].append(int(runtime))   # append runtime to existing list
fh.close()
                
for genre in genreDict:      # loop over the genres in the dictionaries
    average = sum(genreDict[genre])/len(genreDict[genre])  # calculate average length per genre
    hours   = int(average/3600)                                 # format seconds to hours
    minutes = (average - (3600*hours))/60             # format seconds to minutes
    print('The average length for movies in genre '+genre\
          +' is '+str(hours)+'h'+str(round(minutes))+'min')

NEW TOPIC: Functions

Drawing

A lot of ugly formatting for calculating hours and minutes from seconds...

In [ ]:
def FormatSec(genre):   # input a list of seconds
    average   = sum(genreDict[genre])/len(genreDict[genre])
    hours     = int(average/3600)
    minutes   = (average - (3600*hours))/60   
    return str(hours)+'h'+str(round(minutes))+'min'


fh        = open('../downloads/250.imdb', 'r', encoding = 'utf-8')
genreDict = {}

for line in fh:
    if not line.startswith('#'):
        cols    = line.strip().split('|')
        genre   = cols[5].strip()
        glist   = genre.split(',')
        runtime = cols[3]      # length of movie in seconds
        for entry in glist:
            if not entry.lower() in genreDict:
                genreDict[entry.lower()] = [int(runtime)]   # add a list with the runtime
            else:
                genreDict[entry.lower()].append(int(runtime))   # append runtime to existing list
fh.close()
                
for genre in genreDict:
    print('The average length for movies in genre '+genre\
          +' is '+FormatSec(genre))

Function structure

Drawing

Function structure

Drawing

In [ ]:
def addFive(number):    
    final = number + 5
    return final

addFive(4)
In [ ]:
from datetime import datetime

def whatTimeIsIt():
    time = 'The time is: ' + str(datetime.now().time())
    return time

whatTimeIsIt()
In [ ]:
def addFive(number):
    final = number + 5
    return final

addFive(4)
#final

final = addFive(4)
final

Scope

  • Variables within functions
  • Global variables
In [ ]:
def someFunction():
#    s = 'a string'
    print(s)
    
s = 'another string'
someFunction()
print(s)

Why use functions?

  • Cleaner code
  • Better defined tasks in code
  • Re-usability
  • Better structure

Importing functions

  • Collect all your functions in another file
  • Keeps main code cleaner
  • Easy to use across different code

Example:

  1. Create a file called myFunctions.py, located in the same folder as your script
  2. Put a function called formatSec() in the file
  3. Start writing your code in a separate file and import the function
In [ ]:
from myFunctions import formatSec

seconds = 32154

formatSec(seconds)
In [ ]:
from myFunctions import  formatSec, toSec

seconds = 21154
print(formatSec(seconds))

days    = 0
hours   = 21
minutes = 56
seconds = 45

print(toSec(days, hours, minutes, seconds))

myFunctions.py

Drawing

Summary

  • A function is a block of organized, reusable code that is used to perform a single, related action
  • Variables within a function are local variables
  • Functions can be organized in separate files and imported to the main code

→ Notebook Day_3_Exercise_1 (~30 minutes)

NEW TOPIC AGAIN: sys.argv

  • Avoid hardcoding the filename in the code
  • Easier to re-use code for different input files
  • Uses command-line arguments
  • Input is list of strings:
    • Position 0: the program name
    • Position 1: the first argument

The `sys.argv` function

Python script called print_argv.py:

Drawing

Running the script with command line arguments as input:

Drawing

Instead of:

Drawing

do:

Drawing

Run with:

Drawing

IMDb

Re-structure and write the output to a new file as below

Drawing

Note:

  • Use a text editor, not notebooks for this
  • Use functions as much as possible
  • Use sys.argv for input/output

Answer - Example

Drawing

Run with: Drawing