Dictionaries

Dictionaries are one of the best features Python has to offer. They act as the building blocks for many efficient and elegant algorithmic solutions.

Map, hash table, associative array are other common names used in other languages to describe the same concept.

A Dictionary is a Mapping

A dictionary is a collection of key:value pairs, collectively known as items.

A key can be any immutable type, while the value it maps to can be of any type.

A dictionary is very efficient at retrieving a value given its key.

The keys are unique, so no duplicates are allowed.

Create an empty dictionary

You can create a dictionary using the dict() function or its shortcut notation using {}.

Initialize a dictionary

You can initialize a dictionary when you create it by proviing key:value pairs.

Add an item

To add an item use the square brackets []:

d['key'] = value

Since version 3.7 dictionary order is guaranteed to be insertion order.

Traditionally a mapping structure is unordered.

Access an item

To access an item use [], with the key as its argument. If the key is not found an error will occur.

len()

Use the len() function to find out how many elements a dictionary holds.

in

The in operator checks for key containment, i.e. if a key is present.

values()

The question often is whether a given value is within the dictonary. For that use the values() function along with the in operator.

Recall, a dictionary is great for locating keys quickly and efficiently.

The reason has to do with its internal structure as a hash table. Basically, each key search takes the same amount of time no matter how many items are in the dictionary.

Searching a list on the other hand will depend on its length, since each element is searched in order. Thus, accessing the first element will take less time than accessing the last.

We say that a dictionary search is O(1), denoting constant time, while a list search is O(n) denoting a linear time. O here refers to the growth Order of time propotional to size.

The concept of algorithm analysis is studied elsewhere.

Counting with dictionaries

If I ask you to count how many times each character appears in a string, such as 'Hello there.' how would you go about it?

Let's examine some options:

  1. Use 26 variables, each acting as a counter for each letter as you traverse the string.

  2. Use a list, each element holding a count. Map each character to its index using ord(), i.e. ord('A') returns 65 which is 'A's ASCII value.

  3. Use a dictionary with the key representing the character and the value its count. Add a newly found character with a count of 1, else increment the count of an existing character.

Which implementation seems better?

Although each does the same thing, each is implemented differently and offers different advantages.

Both the variable and list options require knowledge of the characters used in order to create appropriate counters. The list however may waste elements for those characters that don't appear in the string, since they would still be part of the setup.

A dictionary on the other hand is more efficient in this regard, since we would only add the characters we encounter, not to mention the efficient access to each.

The function histogram() below returns a dictionary with the character frequencies. That is what a histogram is in essence; a collection of frequencies (counters).

get()

A dictionary offers the get() method that takes a key and a default value. The idea is that if the key exists, its value is returned, else the default value is returned instead.

Let's rewrite the histogram function above to make use of this method and simplify the code in the process.

The function still traverses the string as before, but the conditional logic is now removed.

Line 4 uses the get() method to add 1 to the current count of a contained character or 0 otherwise.

Traversing a dictionary

Use a for along with in to traverse each key.

sorted()

If you want to display the keys in sorted order, use the sorted() function.

Reverse Lookup

A direct lookup is easy. Given a key k and a dictionary d, then its value can be accessed with v = d[k].

The interesting question is "If I have a value, what is its key?".

This is a bit tricky since multiple keys may map to the same value. Remember, keys are unique, not the values.

So, because of this fact, there is no easy reverse lookup. You will have to search.

The raise statement is part of exception handling, and we shall look at that later.

Dictionaries and Lists

Inverting a dictionary

Our histogram stores the frequency of occurrence of each letter. But, what if we were interested in the reverse, i.e. a list of all the letters with a given frequency.

Since a dictionary's value can be any type, we can use a list to hold all the letters that share the same frequency.

Let's look at how we can do this.

Here is the state diagram of the dictionary and the inverted dictionary.

inverted dictionary

A dictionary uses a hashtable and thus the keys must be hashable.

A hash function basically maps a key to an integer, which is then used to access a key:value pair.

As this mapping only works if the hash function produces the same integer for a given key, it stands to reason that if you change the key, the hash function will return a different integer. Well, that is not good.

So, a key has to be immutable for all this to work. Let's see what happens if it is not.

Memos

A memo is a previously computed value that is stored for later use.

The fibonacci series is a good example of where we can make use of this idea. The larger the requested number the more duplication of terms exists.

Look at the call graph for fibonacci(4) to appreciate the unfortunate redundancy.

fibonacci call graph

Since many terms are computed repeatedly (0, 1, 2...) we can use a dictionary to store each computed term the first time, and then simply retrieve it on subsequent calls.

Here's a memoised version of the fibonacci() function.

Global Variables

known is created outside of the function, so it belongs to the __main__ frame. Another way to say this is that it is part of the main module's symbol table.

As such it is often called a global variable since it can be read from any function. To write to a global variable you must declare it inside a function with the global keyword, else a new local variable with the same name is created instead.

Let's look at this concept closely.

Notice how the global variable was superseded by a local variable of the same name, thus its value was never updated.

Now look at the next example that actually does update its value.

It should be noted that in Python everything is an object, and so variables are object references. So, if you have a mutable object, you can update the instance easily, as we saw with the known dictionary.

Practice problems

Create a separate Python source file (.py) in VSC to complete each exercise.

Download the file CROSSWD.TXT and rename it words.txt. This file includes official crosswords and you will need it for the exercises.


p1: wordDictionary.py

Write a function that reads the words in words.txt and stores them as keys in a dictionary. It doesn't matter what the values are. Then you can use the in operator as a fast way to check whether a string is in the dictionary or not.


p2: invertDictionarySetDefault.py

Read the documentation of the dictionary method .setdefault() and use it to write a more concise version of the invertDict() function in this discussion.


p3: hasDuplicates.py

Write a function named hasDuplicates() that takes a list as a parameter and returns True if there is any object that appears more than once in the list.

Use a dictionary for a fast version of this function.


p4: rotatePairs.py

Two words are rotate pairs if you can rotate one of them and get the other (Refer to p2: caesarCipher.py in the strings discussion).

Write a program that reads a wordlist and finds all the rotate pairs.

[Lists] [TOC] [Tuples]