A file is an organization unit of a group of information on your computer that persists even when there is no power to the computer.
input
is tedious and error-proneFiles can be created by a user (e.g., in Atom, File -> New File) or as the output of another program.
The types of files we primarily cover in this class have text and numbers (and will usually have file extensions like txt=text, csv=comma-separated value, and data). There are others that have binary data (e.g., images), but we don't cover those.
When working with files, the common pattern is to first open them, then use them (read and/or write), then close them (so that they can be used by other people/programs).
In this class we focus on reading from a file (i.e., getting information from), but writing (i.e., adding/changing information) to a file is quite similar (and covered in the book).
# Open a file by providing its name and a "mode" (for us, r=read)
myfile = open("myfile.txt", "r")
# The myfile variable now lets you interact with the contents
# of the file via various functions
filecontents = myfile.read()
# CLOSE the file
myfile.close()
# Continue
print(filecontents)
Since it is quite easy to forget to close a file, or have your program not get to the close
function, a safer way to code is to use the following...
# The "with" promises to close myfile
# once Python completes the next code block
# for any reason
with open("myfile.txt", "r") as myfile:
filecontents = myfile.read()
# The file is now closed
print(filecontents)
If the file is in the same directory, simply provide its name. Otherwise, you will have to supply a path (i.e., sequence of directories) to find the file.
For example... open("/dir1/dir2/myfile.txt", "r")
There are multiple ways to get data out of a file...
# Read the entire file as a giant string
with open("myfile.txt", "r") as myfile:
filecontents = myfile.read()
# Note that all line breaks in the file are actually in the variable
print([filecontents])
# Once in a variable, we can treat this as any other string
print(filecontents.split())
# Sometimes we care about data line-by-line (think spreadsheets)
# There are a few ways to do this...
# Loop over all the lines
with open("myfile.txt", "r") as myfile:
for line in myfile:
# Again, notice the newline
print([line])
print()
# Read all the lines into a list of strings
with open("myfile.txt", "r") as myfile:
lines = myfile.readlines()
print(lines)
# for line in lines:
# print(line)
print()
# Read a line at a time (e.g., if line contents are different)
with open("myfile.txt", "r") as myfile:
firstline = myfile.readline()
print(['FIRST', firstline])
for nextline in myfile:
print([nextline])
Notice that like from the input
function, all data comes in as a string.
Let's sum all the digits in a file
import string
with open("myfile.txt", "r") as myfile:
contents = myfile.read().split()
print(contents)
# Convert string of 1-9 into
# list of individual digits
digits = list(string.digits)
print(digits)
sum = 0
for element in contents:
if element in digits:
sum += int(element)
print("Sum of all digits in the file: {}".format(sum))
How about using the dictionary file from HW, along with a variant of the hamming distance from earlier in the semester, to make a quick spell checker
# Let's say that the distance
# between two words is the difference
# in their lengths + number
# of different letters
def word_dist(w1, w2):
dist = abs(len(w1) - len(w2))
min_len = min(len(w1), len(w2))
for l1, l2 in zip(w1[:min_len], w2[:min_len]):
if l1 != l2:
dist += 1
return dist
print(word_dist('apple', 'apple'))
print(word_dist('apple', 'orange'))
print(word_dist('apple', 'appie!'))
def find_closest(w, words_fname):
with open(words_fname, "r") as wfile:
candidate = wfile.readline().strip()
closest = [candidate]
closest_dist = word_dist(w, candidate)
for candidate in wfile:
candidate = candidate.strip()
dist = word_dist(w, candidate)
if dist < closest_dist:
closest = [candidate]
closest_dist = dist
elif dist == closest_dist:
closest.append(candidate)
return closest, closest_dist
def spellcheck(w):
closest, closest_dist = find_closest(w.lower(), 'words_alpha.txt')
if closest_dist == 0:
return True
else:
return closest
print(spellcheck('apple'))
print(spellcheck('appie'))
print(spellcheck('craezee'))