20130521

begin erlang


copyright : http://www.cis.upenn.edu/~matuszek/General/ConciseGuides/concise-erlang.html


A Concise Guide to Erlang
Copyright ©2010, David Matuszek

About Erlang

Erlang is an expression-oriented, single-assignment, garbage-collected, purely functional language. There are no loops, so recursion is heavily used.
Erlang is quite a small language. It is of interest primarily because of its approach to concurrency, using Actors. Actors have subsequently been incorporated into other languages, most importantly Clojure and Scala. Erlang is most suitable for building extremely reliable, fault-tolerant systems that do not need to be shut down in order to be upgraded. Its extremely convenient bit-manipulation makes it an excellent language for low-level communications.

Running Erlang

As with many languages, Erlang can be run in a REPL (Read-Eval-Print-Loop) "shell." Short pieces of code can be tested directly in the shell. To start the shell, enter erl at the command line. Within the shell,
  • Use   c(module.erl).   or    c(module).  to compile a module named module from the file module.erl. The parameter to c should be an atom.
  • Use   f().   to clear (forget) previous associations.
  • Use the up and down arrows to choose a previous entry.
  • Use control-C to exit the shell.
Compared to the REPL for many other languages, Erlang's REPL is quite limited:
  • Directives beginning with a minus sign cannot be used in the REPL. In particular, you cannot import any files.
  • Functions cannot be defined in the REPL.
  • Except for a very few built-in functions, function calls must be prefixed by the name of the module in which they are defined; for example,lists:map(args)my_module:my_function(args).

Directives

Every Erlang program should begin with a module directive, of the form
          -module(filename).
and saved in a file with the name filename.erl.
To provide functions defined in this file to other programs, use
          -export([function1/arity1, ..., functionN/arityN]).
where the "arity" is the number of parameters expected by the function.
To use functions defined in another file, use
          -import(filename, [function1/arity1, ..., functionN/arityN]).
where the "arity" is as above. Imported methods may be called without a filename: prefix.
To define a record:
          -record(Name, {Key1 = Default1, ..., KeyN = DefaultN}).
where the Keys are atoms; the default values are optional. Records may be defined in Erlang source files or in files with the extension .hrl. , but may not be defined in the REPL.
To specify compiler options:
          -compile(Options).
The export_all option is useful for debugging, but should be avoided in production code.

Documentation

Comments begin with a % character and continue to the end of the line.
Erlang used EDoc (inspired by Javadoc). EDoc comments go before a module or a function. Some of the tags that can be used for a module are @author@copyright@deprecated@doc (followed by XHTML), and @version. Some of the tags that can be used for a function are@deprecated@doc (followed by XHTML), @private, and @spec.

Variables

Erlang is a single-assignment language. That is, once a variable has been given a value, it cannot be given a different value. In this sense it is like algebra rather than like most conventional programming languages.
Variables must begin with a capital letter or an underscore, and are composed of letters, digits, and underscores.
The special variable _ is a "don't care" variable--it does not retain its value. It is as if every occurrance of _ is a new, different variable.
Erlang issues a warning if a variable occurs only once in a function. To eliminate this warning, use an underscore as the first character of the variable name.

Data types

Erlang has:
  • Integers, of unlimited size: 1112223344455666777888999000.
    • Integers may be written in any base from 2 to 36, with the syntax base#number, for example, 16#3FF is 1023.
    • The ASCII value of characters can be written as $c, for example, $A is 65, and $\n is 10.
  • Floats1234.56786.0221415e23.
  • Strings, enclosed in double quotes: "This is a string."
    • A string is implemented as a list of ASCII (integer) values; how it is printed depends on whether it contains non-ASCII values.
    • Erlang has no Unicode support, but Unicode strings can be represented as a list of integers.
    • Standard escape sequences, such as \n and \t, may be used in strings.
  • Atoms. An atom stands for itself. It begins with a lowercase letter and is composed of letters, digits, and underscores, or it is any string enclosed in single quotes: atom1'Atom 2'.
    • Erlang has no separate "boolean" type, but uses the atoms true and false to represent boolean values.
  • Lists, which are a comma-separated sequence of values enclosed in brackets: [abc, 123, "pigs in a tree"].
  • Tuples, which are a comma-separated sequence of values enclosed in braces: {abc, 123, "pigs in a tree"}.
    • Because the values in tuples are "anonymous," a common technique is to use name-value pairs, with the name being an atom:{{name, "Pat"}, {age, 27}, {gender, female}}.
  • Records, which are not a separate data type, but are just tuples with keys associated with each value. They are declared in a file and defined (given specific values) in the program.
  • Binaries, enclosed in double angle brackets: <<0, 255, 128, 128>><<"hello">><<X:3, Y:7, Z:6>>. Binaries are sequences of bits; the number of bits in a binary must be a multiple of 8.
    • Erlang has extremely good support for binaries, most of which is beyond the scope of this paper.
  • References are globally unique values, created by calling make_ref().
  • Process identifiers (Pids) are the "names" of processes.

Type tests and conversions

To test for or convert "strings," recall that strings are actually lists of integers.

Type tests

is_atom(X)is_function(X)is_number(X)is_tuple(X)
is_binary(X)is_function(XN)is_pid(X)is_record(X)
is_constant(X)is_integer(X)is_port(X)is_record(XTag)
is_float(X)is_list(X)is_reference(X)is_record(XTagN)

Type conversions

atom_to_list(Atom)float_to_list(Float)list_to_binary(List)round(Float)
binary_to_list(Binary)integer_to_list(Integer)list_to_integer(List)trunc(Float)
float(Integer)list_to_atom(List)list_to_tuple(List) 
float(List)list_to_existing_atom(List)tuple_to_list(Tuple) 

Operations

Arithmetic operations
OperationDescription
+Xunary plus
-Xunary minus
X * Ymultplication
X / Ydivision (yields a float)
X div Yinteger division
X rem Yremainder
X + Yaddition
X - Ysubtraction
Term Comparisons
ComparisonDescription
X < Yless than
X =< Yequal or less than (not X <= Y !)
X == Yequal and not equal; use only for comparing integers and floats
X /= Y
X >= Ygreater or equal
X > Ygreater
X =:= Yequal/identical to
X =/= Yunequal/not identical to
Any term may be compared with any other term. The ordering is: number < atom < reference < fun < port < pid < tuple < list < binary.
Boolean operations
OperationDescription
not Xnot
X and Yand
X or Yor
X xor Yexclusive or
X andalsoYshort-circuit and
X orelseYshort-circuit or
Bitwise operations
OperationDescription
bnot Xbitwise not
X band Ybitwise and
X bor Ybitwise or
X bxor Ybitwise exclusive or
X bsl Nbitshift left by N
X bsr Nbitshift right by N

Pattern matching

The pattern matching expression

Pattern matching is the fundamental operation in Erlang. A simple pattern matching expression looks like an assignment statement in other languages:
     pattern = expression.
This says to evaluate the expression, and try to match the result to the pattern. In this context, it is an error if the pattern match does not succeed. Note that every statement in Erlang ends with a period.
In general, pattern matching succeeds in the following cases:
  • The pattern is an unbound variable. When the pattern match succeeds, the variable is bound to the value of the expression.
  • The pattern is bound to a value, and the expression evaluates to the same value.
  • The pattern is a structure (list or tuple) which may contain unbound variables, and the expression results in the same structure; when the pattern match succeeds, the unbound variables become bound to the corresponding parts of the evaluated expression.

Examples

Variable = expression.
The expression is evaluated.
  • If the Variable has no previous value, it is given the value of the expression; this makes the two sides equal, so the pattern match succeeds.
  • If the Variable has a previous value, and it is equal to the value of the expression, then the pattern match succeeds, otherwise it fails.
[H|T] = expression.
If the value of the expression is a nonempty list, H is matched against the head of the list (the first element) and T is matched against the tail of the list (the remaining elements). If either fails to match, or if the expression does not evaluate to a nonempty list, the pattern match fails. Note that H and T may be variables, literals, or expressions.
[H1, H2, ..., HN|T] = expression.
H1, H2, ..., HN are matched against the first N elements of the list, and T is matched against the remaining elements. If any part fails to match, the pattern match fails.
{ABC} = {XYZ}.
The expressions on the right are evaluated and compared, in order, against the patterns on the left (that is, A=XB=YC=Z). In order for the pattern match to succeed, the tuples must be the same length, and corresponding parts must match.
#Name{Key = Variable, ..., Key = Variable} = Record.
The Variables are matched against the values of the named Keys in the Record.
<<Pattern:Size, ..., Pattern:Size>> = Binary.
The values in the Binary are unpacked into their component parts and matched against the Patterns.
 

Case expressions

The case expression uses pattern matching, and has the following syntax:
case Expression of
    Pattern1 [when Guard1] -> Expression_sequence1;
    Pattern2 [when Guard2] -> Expression_sequence2;
    ...
    PatternN [when GuardN] -> Expression_sequenceN
end
The brackets indicate that the when part (which is just a condition) is optional. The expression is evaluated, and the patterns are tried, in order. When a matching pattern is found (and whose associated guard, if present, is true), the corresponding expression sequence is evaluated. The value of an expression sequence is the value of the last expression, and that becomes the value of the case.

If expressions

The if expression is like a case expression without the pattern matching.
if
    Guard1 ->
        Expression_sequence1;
    Guard2 ->
        Expression_sequence2;
    ...
    GuardN ->
        Expression_sequenceN
end
The value of the if expression is the value of the expression sequence that is chosen. The value of an expression sequence is the value of the last expression executed. It is an error if no guard succeeds; hence, it is common to use true as the last guard.

Guards

Guards may not have side effects. To ensure this, user-defined functions are not allowed in guards. Things that may be uses are: type tests, boolean operators, bitwise operators, arithmetic operators, relational operators, and the following BIFs (Built In Functions):
abs(Number)hd(List)node(X)size(TupleOrBinary)
element(IntegerTuple)length(List)round(Number)trunc(Number)
float(Number)node()self()tl(List)

Defining functions

A function is a value, or first-class object. That means it can be assigned to a variable, or given as an argument to a function, or returned as the value of a function.

Named functions

The syntax for a named function is a series of one or more clauses:
name(Patterns1) -> Body1;
name(Patterns2) -> Body2;
...
name(PatternsN) -> BodyN.
where
  • The name and the arity (number of patterns given as parameters) are the same for each clause.
  • Clauses are tried in order until one of the parameter lists (sequence of patterns) matches, then the corresponding Body is evaluated.
  • Each Body consists of an sequence of expressions, separated by commas; the value of the sequence, and therefore the value of the function, is the value of the last expression evaluated.
  • It is an error if no parameter list matches.

Recursion

Recursion is when a function calls itself, either directly (f calls f) or indirectly (f calls g, which calls h, ..., which calls f). Any program which uses a loop can be rewritten to use recursion, and vice versa. Erlang has no loops, therefore recursion is used heavily.
Here is one way to write the equivalent of a loop in Erlang:
myFunction(args1) ->
    args2 = SomeExpression(args1);
    myFunction(args2).
Tail recursion is when the recursive call is the very last thing done in the function. As an example, the usual definition of the factorial function,
    factorial(0) -> 1;
    factorial(N) -> N * factorial(N - 1).

is not tail recursive, because a multiplication is performed after the recursive call.
In general, each recursive call adds information to an internal stack; very deep recursions can cause Erlang to run out of memory. Tail recursion is desirable because the compiler can easily change a tail recursion into a loop, which does not add information to the stack, and therefore does not cause memory problems.
Functions that are not tail recursive (such as factorial) can usually be rewritten as tail recursive functions, with the aid of a helper function. As with many optimizations, this is not recommended until proven necessary, because the resultant code is harder to read and understand.

Anonymous functions

The syntax for an anonymous function is
fun(Patterns1) -> Body1;
   (Patterns2) -> Body2;
   ...
   (PatternsN) -> BodyN
end

Functions as first-class objects

Functions are values. That is, they may be assigned to variables, passed as arguments to functions, and returned as the result of functions.
An anonymous function may be used as a literal value. A named function may be referred to by using the syntax fun FunctionName/Arity.

Lists

A list literal can be written as a bracketed, comma-separated list of values. The values may be of different types. Example:[5, "abc", [3.2, {a, <<255>>}].
list comprension has the syntax [Expression || GeneratorGuardOrGenerator, ..., GuardOrGenerator]
where
  • The Expression typically makes use of variables defined by a Generator,
  • Generator provides a sequence of values; it has the form Pattern <- List,
  • Guard is a test that determines whether the value will be used in the Expression.
  • At least one Generator is required; Guards and additional Generators are optional.
Example list comprehension:
     N = [1, 2, 3, 4, 5].
     L = [10 * X + Y || X <- N, Y <- N, X < Y].  % Result is [12,13,14,15,23,24,25,34,35,45]

hd(L) returns the first element in the list Ltl(Lreturns the list of remaining elements.

Selected operations on lists

The following operations are predefined.
  • hd(List) -> Element-- Returns the first element of the list.
  • tl(List) -> List -- Returns the list minus its first element.
  • length(List) -> Integer -- returns the length of the list.
The following functions are in the lists module. To call them, either first import them, or prepend lists: to the function call.The definitions are copied from http://www.erlang.org/doc/man/lists.html. Of these, the operations mapfilterfoldl, and seq are the most commonly used.
  • all(PredList) -> bool() -- Returns true if Pred(Elem) returns true for all elements Elem in List, otherwise false.
  • any(PredList) -> bool() -- Returns true if Pred(Elem) returns true for at least one element Elem in List.
  • append(List1List2) -> List3 -- Returns a new list List3 which is made from the elements of List1 followed by the elements of List2.
    • lists:append(AB) is equivalent to A ++ B.
  • dropwhile(PredList1) -> List2 -- Drops elements Elem from List1 while Pred(Elem) returns true and returns the remaining list.
  • filter(PredList1) -> List2 -- List2 is a list of all elements Elem in List1 for which Pred(Elem) returns true.
    • Example: lists:filter(fun(X) -> X =< 3 end, [3, 1, 4, 1, 6]). % Result is [3,1,1]
  • flatmap(FunList1) -> List2 -- Maps Fun to List1 and flattens the result.
  • flatten(DeepList) -> List -- Returns a flattened version of DeepList.
  • foldl(FunAcc0List) -> Acc1 -- Calls Fun(Elem, AccIn) on successive elements A of List, starting with AccIn == Acc0. Fun/2 must return a new accumulator which is passed to the next call. The function returns the final value of the accumulator. Acc0 is returned if the list is empty.
    • Example: lists:foldl(fun(X, Y) -> X + 10 * Y end, 0, [1, 2, 3, 4, 5]). % Result is 12345
  • foreach(FunList) -> void() -- Calls Fun(Elem) for each element Elem in List. This function is used for its side effects and the evaluation order is defined to be the same as the order of the elements in the list.
  • map(FunList1) -> List2 -- Takes a function from As to Bs, and a list of As and produces a list of Bs by applying the function to every element in the list. This function is used to obtain the return values. The evaluation order is implementation dependent.
    • Example: lists:map(fun(X) -> 2 * X end, [1, 2, 3]). % Result is [2,4,6]
  • member(ElemList) -> bool() -- Returns true if Elem matches some element of List, otherwise false.
  • partition(PredList) -> {SatisfyingNonSatisfying} -- Partitions List into two lists, where the first list contains all elements for which Pred(Elem) returns true, and the second list contains all elements for which Pred(Elem) returns false.
  • reverse(List1) -> List2 -- Returns a list with the top level elements in List1 in reverse order, with the tail Tail appended.
  • seq(FromTo) -> Seq -- Returns a sequence of integers from From to To, inclusive.
  • seq(FromToIncr) -> Seq -- Returns a sequence of integers which starts with From and contains the successive results of adding Incr to the previous element, until To has been reached or passed (in the latter case, To is not an element of the sequence).
  • sort(List1) -> List2 -- Returns a list containing the sorted elements of List1.
  • takewhile(PredList1) -> List2 -- Takes elements Elem from List1 while Pred(Elem) returns true, that is, the function returns the longest prefix of the list for which all elements satisfy the predicate.
  • unzip(List1) -> {List2List3} -- "Unzips" a list of two-tuples into two lists, where the first list contains the first element of each tuple, and the second list contains the second element of each tuple.
  • zip(List1List2) -> List3 -- "Zips" two lists of equal length into one list of two-tuples, where the first element of each tuple is taken from the first list and the second element is taken from corresponding element in the second list.

Selected operations on strings

Strings are lists of ASCII values, so all the list operations apply. The following are in the string module, so either import them or prepend each function call with string:.The definitions are copied from http://www.erlang.org/doc/man/string.html.
  • len(String) -> Length -- Returns the number of characters in the string.
  • equal(String1String2) -> bool() -- Tests whether two strings are equal.
  • chr(StringCharacter) -> Index -- Returns the (1-based) index of the first occurrence of Character in String. 0 is returned if Character does not occur.
  • rchr(StringCharacter) -> Index -- Returns the (1-based) index of the last occurrence of Character in String. 0 is returned if Character does not occur.
  • str(StringSubString) -> Index -- Returns the (1-based) position where the first occurrence of SubString begins in String. 0 is returned if SubString does not exist in String.
  • rstr(StringSubString) -> Index -- Returns the (1-based) position where the last occurrence of SubString begins in String. 0 is returned if SubString does not exist in String.
  • substr(StringStart) -> Substring -- Returns a substring of String, starting at the position Start, and ending at the end of the string.
  • substr(StringStartLength) -> Substring -- Returns a substring of String, starting at the position Start, and ending at length Length.
  • strip(String) -> Stripped -- Returns a string where the leading and trailing blanks have been removed.
  • to_float(String) -> {Float,Rest} | {error,Reason} -- Argument String is expected to start with a valid text represented float (the digits being ASCII values). Remaining characters in the string after the float are returned in Rest.
  • to_integer(String) -> {Int,Rest} | {error,Reason} -- Argument String is expected to start with a valid text represented integer (the digits being ASCII values). Remaining characters in the string after the integer are returned in Rest.
  • to_lower(String) -> Result -- Returns a string in which uppercase characters have been converted to lowercase.
  • to_upper(String) -> Result -- Returns a string in which lowercase characters have been converted to uppercase.

Records

Records are declared in a file with the syntax -record(Name, {Key1 = Default1, ..., KeyN = DefaultN}).
To read the record declarations from a file, use the function rr("records.hrl").
To define a record, use the syntax
          Variable1 = #Name{Key = Value, ..., Key = Value}.
The default value is used for any omitted Key=Value pairs. A new, modified record may be created with the syntax
          Variable2 = Variable1#Name{Key = Value, ..., Key = Value}.
Values may be extracted from a record by using pattern matching:
          #Name{Key = Variable, ..., Key = Variable} = Record.
This assigns to the Variables the corresponding Values in the Record.
Pattern matching may be used in function definitions:
          FunctionName(#Name{Key = Variable, ..., Key = Variable} = Variable) -> FunctionBody.
This makes the selected values, and the entire record (the last Variable) available in the function body.
A record is actually a tuple; the keys are just syntactic sugar available to the compiler. The function rf(Record) tells Erlang to drop the keys and treat the variable Record as the tuple {NameVariable1, ..., VariableN}. This changes the appearance of the variable in the program, not its actual value.

The process dictionary

The process dictionary is a private, mutable hash table that is private to the current process. Keys are atoms; the value associated with a key may be changed. The use of a process dictionary negates many of the advantages of a single-assignment functional language, hence its use is strongly discouraged. Supplied operations are:
put(KeyValue) -> OldValue
Associates the Value with the Key, returning the previous value associated with the Key, or the atom undefined.
get(Key) -> Value
Returns the Value currently associated with the Key, or the atom undefined.
get() -> [{KeyValue}, ..., {KeyValue}]
Returns a list of all Key/Value tuples.
get_keys(Value) -> [Key, ..., Key]
Returns a list of Keys having the given Value.
erase(Key) -> Value
Returns the Value currently associated with the Key, or the atom undefined, and removes the Key/Value pair from the process dictionary.
erase() -> [{KeyValue}, ..., {KeyValue}]
Returns a list of all Key/Value tuples, and erases the contents of the process dictionary.

Concurrency

Concurrent programming is very simple in Erlang. There are three primitives:
PrimitiveDescription
Pid = spawn(Fun)Creates and starts new process ("Actor") and tells it to evaluate Fun. The new process is a very lightweight Thread, managed by Erlang, not an operating system process.
A previously defined function may be passed in with the syntaxfun FunctionName/arity.
Pid ! MessageSends the Message to the Pid process. This is an asynchronousoperation, that is, execution continues without waiting for a reply. The value of the expression is the Message itself.
receive
  Pattern1 [when Guard1] -> Expression_sequence1;
  Pattern2 [when Guard2] -> Expression_sequence2;
  ...
  PatternN [when GuardN] -> Expression_sequenceN
after Timeout ->
  TimeoutExpressionSequence
end
The semantics are similar to that of the case expression. The syntax is a bit complex so that a process can handle messages of many different types.
The after clause is optional. If used:
  • A positive Timeout will cause Erlang to execute theTimeoutExpressionSequence if no message is received within that number of milliseconds.
  • A zero Timeout will cause Erlang to handle a matching message, if any, then immediately execute theTimeoutExpressionSequence.
  • Timeout of the atom infinity will cause theTimeoutExpressionSequence to never be executed.
If there are no patterns between receive and after, the statement "sleeps" for the given number of milliseconds.
Every process has a mailbox. Messages go into the mailbox on a first-come, first-served basis. When the receiving process examines the mailbox (with the receive statement), it takes the first message that can be matched by some Pattern, and executes the correspondingExpression_sequence. Unmatched messages are left in the mailbox.
To send a message to a process, you must know its process id (Pid). If you want the process to send you back a response, you must tell it your Pid, usually as part of the message; for example, Pid ! {MyPid, MessageData}.
It is also possible to register a Pid, thus making it globally available. Here are the BIFs (built-in functions) for doing that:
  • register(AnAtomPid) -- gives the Pid a globally accessible "name," AnAtom.
  • unregister(AnAtom) -- removes the registration. If a registered process dies, it is automatically unregistered.
  • whereis(AnAtom) -> Pid | undefined -- gets the Pid of a registered process, or undefined if no such process.
  • registered() -> [AnAtom :: atom()] -- returns a list of all registered processes.

Exceptions

In addition to exceptions resulting from program errors, there are three kinds of exceptions that the programmer can deliberately generate:
  • exit(Reason) -- exits the current process, and broadcasts {'EXIT', PidReason} to all linked processes.
  • throw(Reason) -- throws an exception that the caller might want to catch.
  • erlang:error(Reason) -- indicates a fatal error.
Code which might throw an exception can be placed in a try..catch statement, with this syntax:
try FunctionOrExpressionSequence of
    Pattern1 [when Guard1] -> Expressions1;
    ...
catch
    ExType1:ExPattern1 [when ExGuard1] -> ExExpressions1;
    ...
after
    AfterExpressions
end
and this semantics:
  • The FunctionOrExpressionSequence is evaluated,
    • If it completes successfully,
      • Its value is compared against the Patterns,
      • the Expression associated with the first matching Pattern is evaluated, and this is the value of the try..catch.
        • The when guards are optional.
    • If an exception occurs,
      • The first matching catch clause is evaluated
        • The ExTypes must be one of throw (default if omitted), exit, or error.
          • Special case: The syntax _:_ will catch every possible exception.
        • The value of the try..catch is the value of the corresponding ExExpression.
  • In any event, the (optional) AfterExpressions are executed, but the resulting value is discarded.

Input/Output

As with most languages, there are a lot of I/O routines. Files can be read as binary, as a sequence of lines, or as Erlang terms. This paper describes only line-oriented I/O.
On output, data is interpolated (inserted) into the FormatString at the following locations (excluding ~n):
  • ~s   Print as a string.
  • ~w   Print any value in "standard syntax". Strings are printed as lists of integers.
  • ~p   Pretty print any value (breaking lines, indenting, etc.)
  • ~n   Print a newline.

Input from the console

Line = io:get_line(Prompt). % Prompt is a string or an atom

Output to the console

io:format(FormatStringListOfData).

Input from a file

{ok, Stream} = file:open(FileName, read).
Line = io:get_line(S, ''). % May return eof
file:close(S)

Output to a file

{ok, Stream} = file:open(FileName, write).
io:format(S, FormatString, ListOfData).
file:close(S).

Articles