Strings

A string is a sequence, which means it is an ordered collection of other values.

String: a sequence

A string is a sequence of characters, and each character can be accessed (indexed) using brackets ([]).

Notice how the indexing starts at 0 and not 1. This reflects the fact that the index is an offset from the beginning of the string.

The index is any valid integer expression.

len()

The len function returns the number of characters in a string. It is used with more than strings as we will see later.

To access the last character in name using len make sure you account for the 0 base indexing.

A common error is to use length, which will raise an error.

So remember, the index ranges from 0 to length - 1.

Negative index

Python indexing allows for a negative index as well. Starting from the end of the string at -1, moving toward the front of the string, subtracting 1 each time.

A way to think about indexing is to consider the lower left corner of each memory cell, i.e. the start of the cell or its offset.

So, for the string 'Hello' the indexing is:

string offsets

Each dot represents the start of the cell in memory and its corresponding positive (left-wise) or negative (right-wise) offset or index.

Traversing with a for loop

Often you will need to process a string, one character at a time, from start to end. This is called traversing and the for statement is best suited for this.

Let's take a look at how we can traverse using a while statement so we can see the advantage of the for.

When using a while loop we need to be concerned with the starting index as well as the ending index. Here i starts at 0 and ends at len(name) - 1, incremented by 1 each time.

As often is the case however, we need to process all characters regardless of how many we have. We can use a for loop and avoid determining the end index.

Let's take a look.

In this for loop c is assigned each character in name, from the first to the last. Nice and simple.

We could also use an indexed approach if need be, using the range() function.

The range() function accepts len(name) as its argument and returns the sequence 0 1 2 ... len(name) - 1, which i then traverses over.

print()

Let's take a quick tour of the print() function and see how we can control how values are displayed.

The print accepts two additional keyword arguments:

sep = ' ' : specifies the character used to separate each output. The default is a space.

end = '\n' : specifies the line terminator. The default is the newline.

Let's rewrite our for with the new print() options.

Let's now use some learned features to generate the sequence: A1,B2,...Z26.

Slicing

Python offers us the ability to slice into our sequences. A slice is just a segment of a string or a selection of characters if you will.

The slice n:m defines the close-open range [n, m), i.e. including n but not m.

If you omit n it starts from the beginning.

If you omit m it goes to the end.

If you omit both, it selects the whole string.

If the range is invalid, i.e. n >= m you get and empty string.

Strings are Immutable

This means that you cannot change a string character using the [] operator.

The reason for the error is that strings are immutable or non-changeable. However, we can assign a new string value to our object, which will create a new string object.

We could also use concatenation. Again, we are creating a new string.

Searching

Searching happens often in programming. We want to search for a character in a string and return its index in the string. If not found, we will return -1.

We can rewrite the find() function using a for statement and its else option.

The else will execute when the range is exhausted. In our search function that will happen if the character is not found.

Looping and Counting

To count how many times a character appears in a string we can traverse the string one character at a time.

Let's see this in action:

Now, as a hands-on practice exercise, rewrite the frequency() function to make use of the find() function.

Try not to look at the solution below until you have given it a go.

String Methods

Methods are similar to functions, but differ in the way you call them.

The string object offers a number of methods we can use. Let's look at some, paying close attention to the calling syntax.

We will look at the .find() method which returns the first occurence of the searched string.

Notice that a copy of s in all caps was returned, and the original remained unchanged.

The in Operator

The boolean operator in returns True if one string is within another.

We can use the in to find out common characters in two strings.

If we wanted to know which characters are in common in apples and oranges we can write a function to do that.

Let's rewrite the above function to return a string with all the common characters instead.

String Comparisons

When comparing strings be mindful of this: "case matters". All uppercase characters order before their corresponding lowerase characters. This is due to the ASCII value comparison that is performed.

Let's look at some examples.

When comparing strings using traditional ordering, where 'a' and A are considerd equal, then convert both to the same case.

Practice problems

Create a separate Python source file (.py) in VSC to complete each exercise.


p1: panlindrome.py

A string slice can take a third index that specifies the "step size"; that is, the number of spaces between successive characters. A step size of 2 means every other character; 3 means every third, etc.

fruit = 'banana'
fruit[0:5:2]        # 'bnn'

A step size of -1 goes through the word backwards, so the slice [::-1] generates a reversed string.

Use this idiom to write a one-line version of is_palindrome() from practice problem p2 from the functionsReturn discussion.


p2: caesarCipher.py

A Caesar cypher is a weak form of encryption that involves "rotating" each letter by a fixed number of places. To rotate a letter means to shift it through the alphabet, wrapping around to the beginning if necessary, so 'A' rotated by 3 is 'D' and 'Z' rotated by 1 is 'A'.

To rotate a word, rotate each letter by the same amount. For example, 'cheer' rotated by 7 is 'jolly' and 'melon' rotated by -10 is 'cubed'. In the movie 2001: A Space Odyssey, the ship computer is called HAL, which is IBM rotated by -1.

Write a function called rotate_word() that takes a string and an integer as parameters, and returns a new string that contains the letters from the original string rotated by the given amount.

You might want to use the built-in function ord(), which converts a character to a numeric code, and chr(), which converts numeric codes to characters. Letters of the alphabet are encoded in alphabetical order, so for example:

ord('c) - ord('a) results in 2

Because 'c' is the two-eth letter of the alphabet. But beware: the numeric codes for uppercase letters are different.

Potentially offensive jokes on the Internet are sometimes encoded in ROT13, which is a Caesar cypher with rotation 13. If you are not easily offended, find and decode some of them.

[Iteration] [TOC] [Word Play Case Study]