20130814

python 3 : What's New


출처 : http://www.ibm.com/developerworks/library/l-python3-1/

Python version 3, also known as Python 3000 or Py3K (a nickname that's a pun on the Microsoft® Windows® 2000 operating system), is the latest version of Guido van Rossum's general-purpose programming language. Although many improvements have been made to the core language, the new version will break backwards compatibility with the 2.x line. Other changes have been anticipated for a while, such as:
  • True division—for example, 1/2 returns .5.
  • The long and int types have been unified into a single type, with the trailing L now removed.
  • TrueFalse, and None are now keywords.
This article—the first in a series on Python 3—covers the new print() function, input(), changes to input/output (I/O), the newbytes data type, changes to strings and string formatting, and finally, changes to the built-in dict type. This article is meant for programmers already familiar with Python who are curious about the changes but don't want to wade through the long list of Python Enhancement Proposals (PEPs).
You'll have to retrain your fingers to stop typing print "hello" and start typing print("hello"), because print is now a function, not a statement. I know, it's painful. Every Python programmer I know—as soon as they install version 3 and get the error "incorrect syntax"—screams in agony. I know the extra two characters are annoying; I know this will break backwards compatibility. But there are advantages.
Consider cases in which you need to redirect standard output (stdout) to a log. The following example opens the file log.txt for appending and assigns the object to fid. A string is then redirected to the file, fid, using print>>:
>>>fid = open("log.txt", "a")
>>>print>>fid, "log text"

Another example is to redirect to standard error (sys.stderr):
>>>print>>sys.stderr, "an error occurred"

Both of the previous examples are nice, but there's a better solution. The new syntax is now to simply pass in a value to the keyword argument file in the print() function. For example:
>>>fid = open("log.txt", "a")
>>>print("log.txt", file=fid)

This code has much cleaner syntax. Another advantage is the ability to change the separator by passing a string to the sepkeyword argument and change the end string by passing another string to the end keyword argument. To change the separator, you could use:
>>>print("Foo", "Bar", sep="%")
>>>Foo%Bar

In general, the new syntax is:
print([object, ...][, sep=' '][, end='endline_character_here'][, file=redirect_to_here])

where the code inside the square brackets ([]) is optional. By default, calling print() by itself appends a newline character ( \n).
In Python version 2.x, raw_input() reads an input from standard input (sys.stdin) and returns a string with the trailing newline character stripped from the end. The following example uses raw_input() to grab a string from the command prompt, then assigns the value to quest.
>>>quest = raw_input("What is your quest? ")
What is your quest? To seek the holy grail.
>>>quest
'To seek the holy grail.'

In contrast, the input() function in Python 2.x expects a valid Python expression, such as 3+5.
Originally, it was suggested that both input() and raw_input() be removed from the Python built-in namespace altogether, thereby requiring an import to have any kind of input capability. This seemed pedagogically unsound; suddenly, simply typing:
>>>quest = input("What is your quest?")

would have turned into:
>>>import sys
>>>print("What is your quest?")
>>>quest = sys.stdin.readline()

which is far more verbose for just a simple input and a lot more to explain to a novice. This would have required teaching whatmodules and imports are, printing a string, and the dot operator. (This suspiciously feels a little too much like the Java™ language...) So, in Python 3, raw_input() is renamed input(), and no import is required to get data from the standard input. If you need to retain the version 2.x input() functionality, use eval(input()), which works identically.
The new data type, the bytes literal, as well as the bytes object are used for storing binary data. This object is an immutable sequence of integers between 0 and 127, or ASCII-only characters. Really, it's an immutable version of the bytearray object from version 2.5. A bytes literal is a string with a b before it—for example, b'byte literal'. Evaluation of a bytes literal produces a newbytes object. You can create a new bytes object with the bytes() function. The constructor for a bytes object is:
bytes([initializer[, encoding]])

For example:
>>>b = (b'\xc3\x9f\x65\x74\x61')
>>>print(b)
b'\xc3\x83\xc2\x9feta'

creates a bytes object but is redundant, because you can create a bytes object by simply assigning a byte literal. (I just wanted to demonstrate that you can do this: I'm not actually suggesting you do it.) If you wanted to use an iso-8859-1 encoding, you could try this:
>>>b = bytes('\xc3\x9f\x65\x74\x61', 'iso-8859-1')
>>>print(b)
b'\xc3\x83\xc2\x9feta'

If the initializer is a string, you must provide an encoding. If the initializer is a bytes literal, you need not specify the encoding type: Remember, bytes literals are not strings. But like strings, you can concatenate bytes:
>>>b'hello' b' world'
b'hello world'

You use the bytes() method to represent both binary data and encoded text. To convert from bytes to str, the bytes object must be decoded (more on this later). Binary data is decoded with the decode() method. For example:
>>>b'\xc3\x9f\x65\x74\x61'.decode()
'ßeta'

You can also read binary data directly from a file. The code:
>>>data = open('dat.txt', 'rb').read() 
>>>print(data) # data is a string
>>># content of data.txt printed out here

opens for reading a file object in binary mode and reads in the entire file.
Python has a single string type, str, that behaves similarly to the version 2.x unicode type. In other words, all strings are unicode strings. Also—and very conveniently for non-Latin text users—non-ASCII identifiers are now permissible. For example:
>>>césar = ["author", "consultant"]
>>>print(césar)
['author', 'consultant']

In previous versions of Python, the repr() method converts 8-bit strings to ASCII. For example:
>>>repr('é')
"'\\xc3\\xa9'"

It now returns a unicode string:
>>>repr('é')
"'é'"

which, as mentioned earlier, is the built-in string type.
String objects and byte objects are incompatible. If you want the string representation of a byte, use its decode() method. Conversely, use the encode() method of a string object if you want a bytes literal from that string.
Many Python programmers felt that the built-in % operator for formatting strings was too constrained, because:
  • It is a binary operator and can take at most two arguments.
  • Exempting the format string argument, all other arguments must be squeezed in with either a tuple or a dictionary.
This style is somewhat inflexible, so Python 3 introduces a new way of doing string formatting. (Both the % operator and thestring.Template module are retained in version 3.) String objects now have a method, format(), that accepts positional and keyword arguments, which are passed into replacement fields. Replacement fields are denoted by curly brackets ({}) inside a string. The element inside a replacement field is simply called a field. Here's a simple example:
>>>"I love {0}, {1}, and {2}".format("eggs", "bacon", "sausage")
'I love eggs, bacon, and sausage'

The fields {0}{1}, and {2} are passed in the positional parameters eggsbacon, and sausage to the format() method. The following example shows how to use format() with keyword arguments passed in to format:
>>>"I love {a}, {b}, and {c}".format(a="eggs", b="bacon", c="sausage")
'I love eggs, bacon, and sausage'

Here's another example that combines positional parameters and keyword arguments:
>>>"I love {0}, {1}, and {param}".format("eggs", "bacon", param="sausage")
'I love eggs, bacon, and sausage'

Remember that it's a syntax error to have a non-keyword argument after a keyword argument. To escape the curly braces, double them, like so:
>>>"{{0}}".format("can't see me")
'{0}'

The positional parameter can't see me isn't printed, because there's no field to print to. Note that this does not cause an error.
The new format() built-in function formats a single value. For example:
>>>print(format(10.0, "7.3g"))
       10

In other words, the g stands for general format, which prints a fixed-width number. The first number before the dot specifies the minimum width, and the number after the dot specifies the precision. The complete syntax for format specifiers is beyond the scope of this article, but you can find links to more information in the Resources section.
Another major change in 3.0 is the removal of the dict.iterkeys()dict.itervalues(), and dict.iteritems() methods in dictionaries. Instead, you use .keys().values(), and .items(), which have been revamped to return lightweight, set-like container objects instead of a list that's a copy of the keys or values. The advantage here is the ability to perform set operations on keys and items without having to copy them. For example:
>>>d = {1:"dead", 2:"parrot"}
>>>print(d.items())
<built-in method items of dict object at 0xb7c2468c>

Note: In Python, sets are unordered collections of unique elements.
Here, I created a dictionary with two keys and values, then printed the values of d.items(), which returns an object, not a list of values. You can test the membership of an element just like a set object:
>>>1 in d # test for membership
True

Here's an example of iterating over the items of the dict_values object:
>>>for values in d.items():
...     print(values) 
...
dead
parrot

But, if you really want a list of values, you can always cast the returned dict object. For example:
>>>keys = list(d.keys())
>>>print(keys)
[1,2]

Metaclasses

According to Wikipedia, "a metaclass is a class whose instances are classes." I explore this concept in more detail in part 2 of this series.
Before delving in to the new mechanisms for I/O, it's necessary to review abstract base classes (ABCs). A more in-depth treatment is provided on this topic in the second part of this series.
ABCs are classes that can't be instantiated. To use an ABC, a subclass must inherit from the ABC and override its abstract methods. A method is abstract if it's preceded with the decorator@abstractmethod. The new ABC framework also provides the@abstractproperty decorator for defining abstract properties. You access the new framework by importing the standard library module abc. Listing 1 provides a simple example.

Listing 1. A simple abstract base class
from abc import ABCMeta

class SimpleAbstractClass(metaclass=ABCMeta):
    pass

SimpleAbstractClass.register(list)

assert isinstance([], SimpleAbstractClass)

The register() method call takes a class as an argument and makes the ABC a subclass of the registered class. You can verify this by calling the assert statement on the last line. Listing 2 provides another example that uses decorators.

Listing 2. An implemented abstract base class with decorators
from abc import ABCMeta, abstractmethod

class abstract(metaclass=ABCMeta):
    @abstractmethod
    def absMeth(self):
        pass
 
class A(abstract):
    # must implement abstract method
    def absMeth(self):
        return 0

Now that you know about ABCs, let's continue with the new I/O system. Previous Python releases lacked important yet exotic functions, such as seek(), for some stream-like objects. Stream-like objects are file-like objects with read() and write()methods—sockets or files, for example. Python 3 has multiple layers for I/O on stream-like objects—a raw I/O layer, a buffered I/O layer, and a text I/O layer—each defined with it own ABC with implementations.
You still open a stream using the built-in open(fileName) function, although you can also call io.open(fileName)). Doing so returns a buffered text file; read() and readline() return strings. (Remember that all strings in Python 3 are unicode.) You can also open a buffered binary file by using the form open(fileName, 'b'). In this case, read() returns bytes, but you can't usereadline().
The constructor for the built-in open() function is:
open(file,mode="r",buffering=None,encoding=None,errors=None,newline=None,closefd=True)

The possible modes are:
  • r: Reading
  • w: Open for writing
  • a: Open for appending
  • b: Binary mode
  • t: Text mode
  • +: Open a disk file for updating
  • U: Universal newline mode
The default mode is rt, or open for reading text mode.
The buffering keyword argument expects one of three integers to determine the buffering policy:
  • 0: Switches buffering off
  • 1: Line buffering
  • > 1: Full buffering (default)
The default encoding is platform dependent. The close file descriptor, or closefd, can be True or False. If False, the file descriptor is kept after the file is closed. Providing a file name won't work, in which case, closefd must be set to True.
The object that open() returns depends on the mode you set. Table 1 shows the return types.

Table 1. Return types for different open modes
ModeObject returned
Text modeTextIOWrapper
BinaryBufferedReader
Write binaryBufferedWriter
Append binaryBufferedWriter
Read/Write modeBufferedRandom
Note: Text mode can be wrwtrt, and so on.
The example in Listing 3 opens a buffered binary stream for reading.

Listing 3. Open a buffered binary stream for reading
>>>import io
>>>f = io.open("hashlib.pyo", "rb")  # open for reading in binary mode
>>>f                                 # f is a BufferedReader object 
<io.BufferedReader object at 0xb7c2534c>
>>>f.close()                         # close stream

The BufferedReader object has access to several useful methods, such as isattypeekrawreadintoreadlinereadlinesseek,seekabletellwritablewrite, and writelines, to name a few. To see the full list, run a dir() on a BufferedReader object.
Whether the Python community will accept version 3 is anyone's guess. The breaking of backwards compatibility will mean supporting two different versions in parallel. Some project developers may not want to migrate their projects, even with the 2to3 converter. Personally, I found that migrating from Python version 2 to 3 was primarily a matter of relearning a few things: It certainly wasn't as drastic a change as moving from Python to say the Java or Perl languages. Many of the changes have been long anticipated, such as true division and changes to dict. Performing a print() is a whole lot easier thanSystem.out.println() in Java, so the learning curve is relatively small and there are advantages to be gained.
I'm guessing from reading entries in the blogosphere that many Pythonistas consider some of the changes—such as the backwards compatibility break—deal breakers. Lambda had originally been scheduled for removal but has been retained, and in its original form. For the complete list of things that are staying, visit the Python core development site. If you're adventurous enough to rummage through the PEPs, you can find more in-depth information there.
The next installment in this series will cover more advanced topics, such as metaclass syntax, ABCs, decorators, integer literal support, base types, and exceptions.

Learn
Get products and technologies
Discuss
author photo - cesar otero
Cesar Otero is a freelance Java and Python consultant. He holds a degree in electrical engineering with a minor in mathematics.