Many of your programs will need to use persistent storage, i.e. files, to read from and/or write to.
Although we have seen how to read from a file already, we will revisit the topic and introduce the write capability.
In many of today's programming languages, the idea of a stream is used to describe IO (Input/Output) operations. The analogy is to a stream of water which flows downhill.
A stream is nothing more than an abstraction to the file itself. When you use a stream, you are basically interacting with a file.
An input stream of course is used to stream input from a device, whether it be the keyboard or in this discussion a file.
Similarly an output stream is used to stream output to a device, again be it the screen or a file.
When a stream is opened there are basically three options or mode in doing so:
read - data is read from the file into the stream and then accessed by the program.
write - program writes data to the stream, which is then flushed (written) to the file.
update - stream can be used for both read and write.
A file handle is an object that refers to the file. It is the stream abstraction I mentioned above. You use this in your code to access the file.
For example:
open(file, mode='r', encoding=None)
The open()
function creates and returns a file object. If the file cannot be opened, an OSError
exception is raised.
The optional mode
string specifies the mode in which the file is opened. It defaults to 'r'
which means open for reading in text mode.
'w'
writes to a new file, truncating the file if it already exists.
'x'
is used for exclusive creation.
'a'
for appending to an existing file, or creating a new file.
The available modes are:
'r'
- open for reading (default)
'w'
- open for writing, truncating the file first
'x'
- open for exclusive creation, failing if the file already exists
'a'
- open for writing, appending to the end of the file if it exists
'b'
- binary mode
't'
- text mode (default)
'+'
- open for updating (reading and writing)
The default mode is 'r'
(open for reading text, synonym of 'rt'
).
Modes 'w+'
and 'w+b'
open and truncate the file.
Modes 'r+'
and 'r+b'
open the file with no truncation.
If reading/writing a binary file simply append a 'b'
after the mode, e.g. r+b
.
.read()
- Read one character at a time.¶You can use the .read()
method to read one or more character(s) at a time. When the end of the file is reached, the method returns an empty string.
charCount = 0
fin = open('rules.txt', 'r')
ch = fin.read(1) # read one character
while ch != '':
print(ch, end='')
charCount += 1
ch = fin.read(1)
fin.close()
print(f'\n\n{charCount} characters read.')
Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. 131 characters read.
.read()
- Read the whole file at once¶The .read()
method can also be used to read the entire file at once. Make sure the file size does not exceed the capacity of the stream.
with open('rules.txt', 'r') as fin:
fileContents = fin.read()
charCount = len(fileContents)
for ch in fileContents:
print(ch, end='')
print(f'\n\n{charCount} characters read.')
Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. 131 characters read.
.realine()
- read one line at a time¶The .readline()
method reads a single line returning an empty string when reaching the end of the file.
charCount, lineCount = 0, 0
fin = open('rules.txt', 'r')
line = fin.readline()
while line != '':
lineCount += 1
charCount += len(line)
print(line, end='')
line = fin.readline()
fin.close()
print(f'\n\n{lineCount} lines read.')
print(f'{charCount} characters read.')
Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. 4 lines read. 131 characters read.
This version is the one we have been using in our previous discussions.
The use of the with
operator guarantees that the file will be closed for us, so one less thing to worry about.
charCount, lineCount = 0, 0
with open('rules.txt', 'r') as fin:
for line in fin:
lineCount += 1
charCount += len(line)
print(line, end='')
print(f'\n\n{lineCount} lines read.')
print(f'{charCount} characters read.')
Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. 4 lines read. 131 characters read.
\n
) treatment¶Note that in each version the newline character \n
is actually read and counted. This may not be the desired option, so you must take care of it if need be.
Notice how in the print()
function used above, the end=''
argument was added. Remove it and run the code again to see the effect of the newline read in.
Recall, to write to a file you can use the w
mode in the .open()
method.
To write to the file use the .write()
method, providing the string to write as its argument.
Let's take a look.
This example will write 10 numbers to the file nums.txt. Since the argument to .write()
must be a string, we use the str()
function to convert each number first.
with open('nums.txt', 'w') as fout:
for num in range(1, 11):
line = str(num) + '\n'
fout.write(line)
Files are saved on your storage device. Such devices are managed by the OS
whether MacOS
, Linux
or Windows
.
Usually files are organized into directories
or folders
, with one such directory acting as the current directory
or working directory
.
It is important to understand that Python first looks for a file in the current directory, usually the same directory as your source file (.py
) or (.ipynb
).
The os
module provides functions for working with files and directories.
To get the current directory use os.getcwd()
.
import os
print(os.getcwd())
/Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks
The path above is an absolute path. An absolute path starts with the root directory, /
in my case on a Mac.
A relative path in contrast, is given in terms of the current directory. Thus, if I wanted to specify a relative path to the topics
directory I would use:
..
where .
corresponds to the current directory and ..
to its parent.
Now, to obtain the absolute path of a file, use os.path.abspath()
.
print(os.path.abspath('rules.txt'))
/Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/rules.txt
To check if a file or directory exists, use os.path.exists()
.
print(os.path.exists('rules.txt'))
True
To check is a path is a directory, use os.path.isdir()
. If returned value is True
, then it is, else it is a regular file.
print(os.path.isdir('rules.txt'))
print(os.path.isdir('/etc'))
False True
To check if a path is a file or not use os.path.isfile()
.
print(os.path.isfile('rules.txt'))
print(os.path.isfile('/etc'))
True False
To list the contents of a directory use os.listdir()
.
cwd = os.getcwd()
l = os.listdir(cwd)
print(f'There are {len(l)} files in the currect directory.')
There are 29 files in the currect directory.
Let's put some of these methods to use with an example.
The following recursive function will walk through each directory and print out the files it finds.
import os
def walk(dir):
for file in os.listdir(dir):
# Build each file's absolute path.
path = os.path.join(dir, file)
if os.path.isfile(path):
print(f'File: {path}')
else:
print(f'Directory: {path}')
walk(path)
walk(os.getcwd() )
File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/ds.db File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/.DS_Store File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/en2sp.db File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/nums.txt File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/iteration.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/rules.txt File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/lists.ipynb Directory: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/__pycache__ File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/__pycache__/lineCount.cpython-39.pyc File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/tuples.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/word-play-case-study.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/data-structures-case-study.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/functions.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/Inheritance.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/emma.txt File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/classes-and-functions.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/files.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/variables.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/convert.txt File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/classes-and-methods.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/dictionaries.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/template.ipynb Directory: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/.ipynb_checkpoints File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/words.txt File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/strings.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/functions-return.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/classes-and-objects.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/linecount.py File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/helloWorld.ipynb File: /Users/stornar/Documents/siue/Learning Series/PythonLS/2021.Spring/topics/notebooks/conditionals.ipynb
The .join()
method concatenates the filename to the absolute path of its directory, to give us the full absolute path to the file itself.
Note: The os
module offers the walk()
function and you should take a look at it, as it is more versatile than our version.
fin = open('doesNotExist.txt', 'r')
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) /var/folders/_5/pbybv9c90j77vqh9wby2v8140000gq/T/ipykernel_26321/2435978521.py in <module> ----> 1 fin = open('doesNotExist.txt', 'r') FileNotFoundError: [Errno 2] No such file or directory: 'doesNotExist.txt'
PermissionError
¶Attempting to access a file you do not have permission to access will raise a PermissionError
as in:
fout = open('/etc/passwd', 'w')
--------------------------------------------------------------------------- PermissionError Traceback (most recent call last) /var/folders/_5/pbybv9c90j77vqh9wby2v8140000gq/T/ipykernel_26321/1266459208.py in <module> ----> 1 fout = open('/etc/passwd', 'w') PermissionError: [Errno 13] Permission denied: '/etc/passwd'
IsADirectoryError
¶If you try to open a directory instead of a file you will raise an IsADirectoryError
as in:
open('/home')
--------------------------------------------------------------------------- IsADirectoryError Traceback (most recent call last) /var/folders/_5/pbybv9c90j77vqh9wby2v8140000gq/T/ipykernel_26321/4054328512.py in <module> ----> 1 open('/home') IsADirectoryError: [Errno 21] Is a directory: '/home'
Now, you could write a series of if
statements to handle such possibilities, but this approach is rarely recommended.
Mixing error handling code via if
statements and normal code makes your program hard to read and maintain. A better, more structured approach is to use exception handling using a try
statement.
try:
# statements that may raise exceptions
except some_error as err:
# handle specific error raised
except:
# handle all other exceptions
else:
# statements executed if no exception raised
The basic idea is you try
some code that may raise an exception.
Each except some_error as err:
clause will list the specific error you want to handle. You may list as many as you want.
The except
allows you to handle (catch) any exception you did not explicitly handle with any of the above except clauses. A catch-all clause so to speak.
Finally, the else
allows you to execute code if no exception was raised.
Let's look at an example.
# Test all assignments in turn by commenting/uncommenting.
filename, mode = 'rules.txt', 'r'
# filename, mode = 'doesNotExist.txt','r'
# filename, mode = '/etc/passwd', 'w'
# filename, mode = '/home', 'r'
try:
f = open(filename, mode)
except (FileNotFoundError, PermissionError, IsADirectoryError) as err:
print(err)
else:
f.close()
print('No issues.')
No issues.
In addition to using flat files to permanently save your data, you may use a simple database offered by Python.
A database is a file that is organized for storing and retrieving data. Similar to a dictionary in that a key
is mapped to a value
, but unlike dictionaries, databases are permanent.
Python provides the dbm
module for handling such files, so let's take a look at an example.
I will use the english to spanish map we saw with our dictionary discussion.
import dbm
# Create the dabase 'en2sp' if it does not exist 'c'.
with dbm.open('en2sp', 'c') as db:
# Let's add a few entries.
# Notice the dictionary like syntax.
db['one'] = 'uno'
db['two'] = 'dos'
# Now let's read them.
print(db['one'], db['two'])
# And we can also iterate through them.
for key in db.keys():
print(key, db[key])
b'uno' b'dos' b'one' b'uno' b'two' b'dos'
Notice how the keys and values are all converted automatically to bytes objects
, indicated with the b
prefix.
You can still think of them as regular strings for now.
A limitation of the dbm
module is that the keys and values have to be either strings or bytes. But, what if you want to use another type?
The pickle
module can save the day in those circumstances. It will translate almost any type into a string for storage, and back to object for retrieval.
dumps()
- object to string tranlation¶The .dumps()
method will translate an object into a string that pickle
can handle easily.
import dbm
import pickle
l = [1, 2, 3]
t = (4, 5, 6)
with dbm.open('ds', 'c') as db:
db['list'] = pickle.dumps(l)
db['tuple'] = pickle.dumps(t)
print(db['list'])
print(db['tuple'])
b'\x80\x04\x95\x0b\x00\x00\x00\x00\x00\x00\x00]\x94(K\x01K\x02K\x03e.' b'\x80\x04\x95\t\x00\x00\x00\x00\x00\x00\x00K\x04K\x05K\x06\x87\x94.'
loads()
- string to object translation¶Notice how the values are not in a human-readable format, so this is where the .loads()
method comes handy.
with dbm.open('ds') as db:
print(pickle.loads(db['list']))
print(pickle.loads(db['tuple']))
[1, 2, 3] (4, 5, 6)
Your OS will most likely offer a shell
or command window
that allows you to interact with it by running commands.
For example, on a Unix
like OS like a Mac, I can get a directory listing by using ls
.
To do something similar with Python, i.e. be able to execute such a command from within your Python code, you can use a pipe.
A pipe object represents a running program and behaves similar to a regular file.
Let's take a look at how to use it to get a directory listing.
import os
ls = 'ls'
with os.popen(ls) as sh:
print(sh.read())
Inheritance.ipynb __pycache__ classes-and-functions.ipynb classes-and-methods.ipynb classes-and-objects.ipynb conditionals.ipynb convert.txt data-structures-case-study.ipynb dictionaries.ipynb ds.db emma.txt en2sp.db files.ipynb functions-return.ipynb functions.ipynb helloWorld.ipynb iteration.ipynb linecount.py lists.ipynb nums.txt rules.txt strings.ipynb template.ipynb tuples.ipynb variables.ipynb word-play-case-study.ipynb words.txt
Another useful command is md5
that returns a hashcode of a file or directory.
This value, often referred to as a digest, can be used as an encrypted representation of a file or directory.
You can then use the digest to compare the contents of a file or directory against any possible unwanted alterations.
For instance, you can hash a file and save its digest. When you want to verify the file has not been altered, hash the current file and compare to the original digest. If not the same, then the file was modified.
import os
with os.popen('md5 rules.txt') as p:
digest = p.read()
print(digest)
MD5 (rules.txt) = 770fe13c64c76eb174a5b1815565d4d4
Each file you have created that contains Python code is of course a module
. Now however, we want to see how to import modules you write.
The thing to note here is that if the imported module contains executable statements, then those statements will execute once you import the module.
This is normally not what you want when you import, but rather when you run the module instead. We will see how to handle this distinction.
Here is a module lineCount.py
that counts how many lines a file has and prints the results.
# File: linecount.py
def lineCount(filename):
with open(filename) as fin:
count = 0
for _ in fin:
count += 1
return count
print(lineCount('linecount.py'))
Make sure you create this module first before running the following code.
import linecount
print('End of test code')
11 End of test code
Now, in order to avoid executing the module when you import it, modify the function call as follows:
if __name__ == '__main__':
print(lineCount('linecount.py'))
This modification says: If the name of the module is __main__
, which will be the case when you execute the module itself, then call the function, else do not.
import linecount
print('End of test code.')
End of test code.
If you want to use the lineCount()
function you may use the linecount
module to do so.
Since you are importing the module its __name__
will be linecount
.
print(linecount.__name__)
print(linecount.lineCount('linecount.py'))
linecount 11
The import
command creates a module object, which in our example exposes the lineCount()
property, a function.
Nothing is actually read in, but rather the name of the module is added to the symbol table of the current module. This table helps Python locate things, such as functions, classes, variable etc.
Since white space (newlines, tabs, spaces) are invisible to the naked eye, they may cause issues that are hard to identify. So, to help you can use the repr()
function that provides a Python interpreter-aware representation.
s = '1 2\t3\n4'
print('str:', s)
print('repr:', repr(s))
str: 1 2 3 4 repr: '1 2\t3\n4'
Create a separate Python source file (.py) in VSC to complete each exercise.
Write a function called sed()
that takes as arguments a pattern string, a replacement string, and two filenames.
It should read the first file and write the contents into the second file (creating it if necessary). If the pattern string appears anywhere in the file, it should be replaced with the replacement string.
If an error occurs while opening, reading, writing or closing files, your program should catch the exception, print an error message, and exit.
In a large collection of MP3 files, there may be more than one copy of the same song, stored in different directories or with different filenames. The goal of this exercise is to search for duplicates.
Write a program that searches a directory and all of its subdirectories, recursively, and returns a list of complete paths for all files with a given suffix (like .mp3). Hint: os.path
provides provides several useful functions for manipulating file and path names.
To recognize duplicates, you can use md5
to compute a checksum for each file. If two files have the same checksum, they probably have the same contents.