conceived c.a. 1992 by Robert Gentleman and Ross Ihaka (R&R) at the University of Auckland, NZ – a tool for teaching statistics,
2000 – stable version.
A programming language is a formal computer language or constructed language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs to control the behavior of a machine or to express algorithms.[source: Wikipedia]
Think of a program as a flow of data from one function to another that does something to the data.
There are three main things that define a programming language:
Syntax is the form, typically defined by the Chomsky II == context-free, grammar like:
Programming language Lisp is defined by the following grammar (BNF or Bakus-Naur Form):
expression ::= atom | list
atom ::= number | symbol
number ::= [+-]?['0'-'9']+
symbol ::= ['A'-'Z''a'-'z'].*
list ::= '(' expression* ')'
Semantics is the meaning, a gramatically correct sentence does not necessarily have a proper meaning:
Static semantics – in compiled languages, e.g. checking that every identifier is declared before the first use or that the conditionals have distinct predicates.
Dynamic semantics – how the chunks of code are executed. For instance lazy vs. eager evaluation.
Static - type determined before execution, declared by the programmer (manifestly-typed) or checked by the compiler (type-inferrred) earlier:
integer i # Declaration
i = 1 # Initialization
Dynamic - type determined when executing.
i = 1
1 + '1' =
Types – ERROR checking!
There many programming paradigms, e.g.:
Two major approaches exist to turn code in a particular language to the machine code:
Interpretation – on-the-fly translation of your code, theoretically line-by-line. This is done every time you run your program and the job is done by a software called an interpreter.
Compilation – your program is translated and saved as a machine code and as such can be directly executed on the machine. The job is performed by a compiler.
Divide et impera – Divide and rule.
Top-down approach: define the big problem and split it into smaller ones. Assume you have solution to the small problems and continue – push the responsibility down. Wishful thinking!
You've got a csv file that contains data about people:
year of birth, favorite music genre and the name of
a pet if the person has one and salary. Your task is
to read the data and, for people born in particular
decades (..., 50-ties, 60-ties, ...), compute the
mean and the variance salary and find the most
frequent pet name.
This task can be decomposed into:
To compute an the mean you have to: sum all values, divide the sum by the number of values – simple enough, we can program it right away.
To compute the variance you need to first refresh the formula: \[Var(X) = \frac{1}{n} \Sigma_{i=1}^{n} (x_{i} - \bar{x})^2\]
Thus, you realise that you need to compute the mean, but you know how to do this from the previous point. So, instead of coding computation of the mean twice, make a function that you can reuse! Lazines is the major driving force of a programmer!
Let’s put it down!
\[Var(X) = \frac{1}{n} \Sigma_{i=1}^{n} (x_{i} - \bar{x})^2\]
Task: create the computeMean procedure that computes the mean for a sequence of numbers
Input: a sequence of numbers, e.g.: {1, 4, 5.7, 42357.533, 42}. Wait, isn’t it a vector?
Output: the computed mean, a single number, that is what we want our procedure to return.
function computeMean(aVector) {
sum = sum all numbers in aVector
count = count how many numbers are in aVector
theMean is: sum / count
return theMean
}
So far, we have learnt about:
Quite a bit, right?