name = 'John Doe'
firstInitial = name[0]
lastInitial = name[5]
print(firstInitial + '.' + lastInitial + '.')
J.D.
Notice how the indexing starts at 0
and not 1
. This reflects the fact that the index is an offset from the beginning of the string.
The index is any valid integer expression.
n = 2
print(name[n])
print(name[n + 1])
h n
The len function returns the number of characters in a string. It is used with more than strings as we will see later.
print('name:', name)
print('len:', len(name))
name: John Doe len: 8
To access the last character in name
using len
make sure you account for the 0
base indexing.
length = len(name)
print(name[length - 1])
e
A common error is to use length
, which will raise an error.
print(name[length])
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) /var/folders/_5/pbybv9c90j77vqh9wby2v8140000gq/T/ipykernel_73588/2874324896.py in <module> ----> 1 print(name[length]) IndexError: string index out of range
So remember, the index ranges from 0
to length - 1
.
Python indexing allows for a negative index as well. Starting from the end of the string at -1
, moving toward the front of the string, subtracting 1
each time.
A way to think about indexing is to consider the lower left corner of each memory cell, i.e. the start of the cell or its offset.
So, for the string 'Hello'
the indexing is:
Each dot represents the start of the cell in memory and its corresponding positive (left-wise) or negative (right-wise) offset or index.
Often you will need to process a string, one character at a time, from start to end. This is called traversing and the for
statement is best suited for this.
Let's take a look at how we can traverse using a while
statement so we can see the advantage of the for
.
i = 0
while i < len(name):
print(name[i])
i = i + 1
J o h n D o e
When using a while
loop we need to be concerned with the starting index as well as the ending index. Here i
starts at 0
and ends at len(name) - 1
, incremented by 1 each time.
As often is the case however, we need to process all characters regardless of how many we have. We can use a for
loop and avoid determining the end index.
Let's take a look.
for c in name:
print(c)
J o h n D o e
In this for
loop c
is assigned each character in name
, from the first to the last. Nice and simple.
We could also use an indexed approach if need be, using the range()
function.
for i in range( len(name) ):
print(name[i])
J o h n D o e
The range()
function accepts len(name)
as its argument and returns the sequence 0 1 2 ... len(name) - 1
, which i
then traverses over.
Let's take a quick tour of the print()
function and see how we can control how values are displayed.
The print accepts two additional keyword arguments:
sep = ' '
: specifies the character used to separate each output. The default is a space.
end = '\n'
: specifies the line terminator. The default is the newline.
Let's rewrite our for
with the new print() options.
for c in name:
print(c, sep = '.', end = '')
John Doe
Let's now use some learned features to generate the sequence: A1,B2,...Z26
.
letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
number = 1
for letter in letters:
print(letter + str(number), end = ',')
number = number + 1
A1,B2,C3,D4,E5,F6,G7,H8,I9,J10,K11,L12,M13,N14,O15,P16,Q17,R18,S19,T20,U21,V22,W23,X24,Y25,Z26,
Python offers us the ability to slice into our sequences. A slice is just a segment of a string or a selection of characters if you will.
s = 'Monty Python'
print(s[0:5]) # select characters [0]... <[5]
print(s[:5]) # select from the beginning to < [5]
print(s[5:len(s)]) # select [5] ... < len(s)
print(s[5:]) # select [5] to the end
print(s[:5], end = '')
print(s[5:])
print(s[:])
Monty Monty Python Python Monty Python Monty Python
The slice n:m
defines the close-open range [n, m)
, i.e. including n
but not m
.
If you omit n
it starts from the beginning.
If you omit m
it goes to the end.
If you omit both, it selects the whole string.
If the range is invalid, i.e. n >= m
you get and empty string.
print('s[1:1]:', s[1:1])
print('s[2:1]:', s[2:1])
s[1:1]: s[2:1]:
This means that you cannot change a string character using the []
operator.
s[0] = 'm'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /var/folders/_5/pbybv9c90j77vqh9wby2v8140000gq/T/ipykernel_73588/1355357284.py in <module> ----> 1 s[0] = 'm' TypeError: 'str' object does not support item assignment
The reason for the error is that strings are immutable or non-changeable. However, we can assign a new string value to our object, which will create a new string object.
s = 'Monty Python Troupe'
print(s)
Monty Python Troupe
We could also use concatenation. Again, we are creating a new string.
s = 'The ' + s
print(s)
The Monty Python Troupe
Searching happens often in programming. We want to search for a character in a string and return its index in the string. If not found, we will return -1
.
def find(start, character, s):
'''Return the index of character or -1 if not found.'''
index = start
while index < len(s):
if s[index] == character:
return index
index = index + 1
# Not found.
return -1
print('P is at index:', find(0, 'P', 'Monty Python'))
print('T is at index:', find(0, 'T', 'Monty Python'))
P is at index: 6 T is at index: -1
We can rewrite the find()
function using a for
statement and its else
option.
def findFor(start, character, s):
'''Return the index of character or -1 if not found.'''
for index in range(start, len(s)):
if s[index] == character:
return index
else:
# Not found.
return -1
print('P is at index:', findFor(0, 'P', 'Monty Python'))
print('T is at index:', findFor(0, 'T', 'Monty Python'))
P is at index: 6 T is at index: -1
The else
will execute when the range is exhausted. In our search function that will happen if the character is not found.
To count how many times a character appears in a string we can traverse the string one character at a time.
Let's see this in action:
def frequency(start, character, s):
'''Returns the frequency of occurrence of character in s.'''
count = 0
for i in range(start, len(s)):
if s[i] == character:
count = count + 1
return count
print('H appears', frequency(0, 'H', 'Hello'), ' time(s) in Hello')
print('l appears', frequency(0, 'l', 'Hello all'), ' time(s) in Hello all')
H appears 1 time(s) in Hello l appears 4 time(s) in Hello all
Now, as a hands-on practice exercise, rewrite the frequency()
function to make use of the find()
function.
Try not to look at the solution below until you have given it a go.
def frequencyFind(start, character, s):
'''Returns the frequency of occurrence of character in s.'''
count = 0
index = find(start, character, s)
while index != -1:
count = count + 1
index = find(index + 1, character, s)
return count
print('H appears', frequencyFind(0, 'H', 'Hello'), ' time(s) in Hello')
print('l appears', frequencyFind(0, 'l', 'Hello all'), ' time(s) in Hello all')
H appears 1 time(s) in Hello l appears 4 time(s) in Hello all
Methods are similar to functions, but differ in the way you call them.
The string object offers a number of methods we can use. Let's look at some, paying close attention to the calling syntax.
We will look at the .find()
method which returns the first occurence of the searched string.
s = 'Hello'
print(s.find('l'))
2
# Find the second occurence of 'l'.
print(s.find('l', 3))
3
# Find 'el'.
print(s.find('el'))
1
# Convert str to upper case.
print(s.upper(), ': ', s)
HELLO : Hello
Notice that a copy of s
in all caps was returned, and the original remained unchanged.
The boolean operator in
returns True
if one string is within another.
print('H in Hello?', 'H' in 'Hello')
H in Hello? True
We can use the in
to find out common characters in two strings.
If we wanted to know which characters are in common in apples
and oranges
we can write a function to do that.
def printInBoth(s1, s2):
'''Prints all characters in common.'''
for c in s1:
if c in s2:
print(c, end = ',')
print('Common characters:', end = ' ')
printInBoth('apples', 'oranges')
Common characters: a,e,s,
Let's rewrite the above function to return a string with all the common characters instead.
def inBoth(s1, s2):
'''Returns a string with all the characters in common.'''
commonCharacters = ''
for c in s1:
if c in s2:
commonCharacters += c
return commonCharacters
print('Common characters:', inBoth('apples', 'oranges'))
Common characters: aes
When comparing strings be mindful of this: "case matters". All uppercase characters order before their corresponding lowerase characters. This is due to the ASCII value comparison that is performed.
Let's look at some examples.
print('A < a?', 'A' < 'a')
print('a < A?', 'a' < 'A')
print('apple < Banana?', 'apple' < 'Banana')
A < a? True a < A? False apple < Banana? False
When comparing strings using traditional ordering, where 'a'
and A
are considerd equal, then convert both to the same case.
print("'a' == 'A'?", 'a' == 'A')
print("'a'.upper() == 'A'?", 'a'.upper() == 'A')
print("'a' == 'A'.lower()?", 'a' == 'A'.lower())
'a' == 'A'? False 'a'.upper() == 'A'? True 'a' == 'A'.lower()? True
Create a separate Python source file (.py) in VSC to complete each exercise.
A string slice can take a third index that specifies the "step size"; that is, the number of spaces between successive characters. A step size of 2 means every other character; 3 means every third, etc.
fruit = 'banana'
fruit[0:5:2] # 'bnn'
A step size of -1 goes through the word backwards, so the slice [::-1]
generates a reversed string.
Use this idiom to write a one-line version of is_palindrome()
from practice problem p2
from the functionsReturn
discussion.
A Caesar cypher is a weak form of encryption that involves "rotating" each letter by a fixed number of places. To rotate a letter means to shift it through the alphabet, wrapping around to the beginning if necessary, so 'A'
rotated by 3
is 'D'
and 'Z'
rotated by 1
is 'A'
.
To rotate a word, rotate each letter by the same amount. For example, 'cheer'
rotated by 7
is 'jolly'
and 'melon'
rotated by -10
is 'cubed'
. In the movie 2001: A Space Odyssey, the ship computer is called HAL, which is IBM rotated by -1
.
Write a function called rotate_word()
that takes a string and an integer as parameters, and returns a new string that contains the letters from the original string rotated by the given amount.
You might want to use the built-in function ord()
, which converts a character to a numeric code, and chr()
, which converts numeric codes to characters. Letters of the alphabet are encoded in alphabetical order, so for example:
ord('c) - ord('a)
results in 2
Because 'c'
is the two-eth letter of the alphabet. But beware: the numeric codes for uppercase letters are different.
Potentially offensive jokes on the Internet are sometimes encoded in ROT13, which is a Caesar cypher with rotation 13. If you are not easily offended, find and decode some of them.