JSON bites Python

21st of March, 2017 Programming Python

Contents

Now we know how to read and write JSON with Python, it might be interesting to learn how to delete some elements. Because most of the people are using third-party JSON APIs, they are not allowed to delete objects. We just download the data we need. However, by creating our own JSON dictionaries, adding new data might introduce typing mistakes or errors. Editing a raw JSON file can easily be a nightmare. As a result, deleting a specific object, moreover quickly, might be an interesting feature to incorporate in a project. When I was developing MoniMoney, I didn’t find a lot about this. I guess mostly because of the reasons I just exposed. So I will try to provide a tutorial on this delete feature. I will first introduce the concept and then I will show how I tried to improve it by developing a solution based on indexes. It allowed me to delete several objects at the same time and this, quickly.

What does the Pop said

The process on how to delete JSON objects is simple and was well explained here. In short, we are iterating all the available items until we find the correct "key": "value" pair. Then we can use a method such as pop() to remove it. A such function could be:

def delete_travel():
    # Opening JSON data
    with open("mydata.json") as json_file:
        json_loaded = json.load(json_file)

    # Getting length of all objects
    items = len(json_loaded["flights"])
    if items == 0:
        print "[Error]: no travel available in this category!"
        sys.exit(0)

    # Current JSON data
    print json_loaded["flights"]

    # Removing an object
    for n in xrange(items):
        if json_loaded["flights"][n]["destination"] == "Seattle":
            json_loaded["flights"].pop(n)
            break
    
    # Updated JSON data
    print json_loaded["flights"]

By just printing the JSON data we can see the flight {'date': '2017-03-18', 'destination': 'Seattle', 'departure': 'London'} has been removed. As we are using a list, to remove an index x we can use other methods such as remove(x) or del(x) and even lambda with a filter().

However I’m not keen on this method. The first reason why is because it removes only the first pair Python finds on the list. So if I have the two following flights 'destination': 'Seattle', 'departure': 'London' and 'destination': 'Seattle', 'departure': 'Paris', only the one departing from London will be removed. To remove the one coming from Paris I would have added in the if statement an and json_loaded["flights"][n]["departure"] == "Paris" condition. Not very handy, isn’t it?

Another drawback is that if you have multiple nested directories, it becomes tricky to remove a specific element; writing fors and ifs might be a long path.

Finally, I don’t like the fact to enter hard values in my code. Obviously the usage of raw_input() is necessary to avoid that but still, typing city names or long strings might be a source of mistakes and frustration.

##Designing Mark II with indexes To avoid that, using indexes seems to be a good solution. I think it’s easily intelligible and intuitive. What we want to do is listing all the objects with an index and asking for the user which one he wants to remove. A simple number will be necessary to complete the process. Because JSON can easily take lots of lines, writing a function dedicated to give a glimpse of an item might be not wasted time. Furthermore, asking for a specific category gives us speed. For a start in our case, we can just be focused on flights.

An example of a function to give an overview of an object can be as follow:

def overview(travel):
    tdate = travel["date"]
    tdest = travel["destination"]
    tdepa = travel["departure"]

    line = "Date: %s, From: %s, To: %s" % (tdate, tdepa, tdest)
    return line

We can even add some extra spaces for a nice printing with the rjust() method. We use it in our for. By a adding an index before printing objects, we have the following loop and output:

for n in xrange(items):
    print "%d -> %s" % (n, overview(json_loaded["flights"][n]))
0 -> Date: 2017-03-18, From: London, To: Seattle
1 -> Date: 2017-03-20, From: Seattle, To: London

I’d like to write an aparte here. It might be interesting to specify an order, from the youngest to the oldest for instance. In MoniMoney, I used this approach because it would allow me to see the most recent bank transfers at the top and typing 0 most of the time instead of a huge value like 481516. To do that I suggest you to define the date field in your objects first and use the reversed() method in your loop(s) like this:

for n in reversed(xrange(items))

Comes now the actual remove operation. We use the same process we previously saw combined with a raw_input() and some value checks.

# Removing index
while True:
    index = raw_input("Select an index: ")
    index = "".join(index.split())
    # Checking the index is a digit
    if index.isdigit() == True:
        index = int(index)
        # Checking the index is in the list
        if 0 <= index < len(json_loaded["flights"]):
            json_loaded["flights"].pop(index)
            break
        else:
            print "[Error] Incorrect index!"
    else:
        print "[Error] Incorrect index!" 

The last step of the process is to write our updated JSON data.

sheetname = "./mydata.json"        # JSON data
sheetname_bak = sheetname + "bak"  # JSON data backup

# Creating a backup file and removing original data
print "Creating backup file..."
cp = subprocess.call(["cp", sheetname, sheetname_bak])
print "Removing original file..."
rm = subprocess.call(["rm", sheetname])

# Creating a new file and writing updated data
print "Writing updated data..."
with open(sheetname, 'w') as sheet:
    json.dump(json_loaded, sheet)

# Removing backup file
print "Removing backup file..."
rm = subprocess.call(["rm", sheetname_bak])

Ready for the show

Our small proof of concept seems to work but we can go further. Instead of being focus on just one category, flights in our example, we can handle all the other. In our case we include cabs. Whereas we could print the totality of the JSON objects, a tiny and quick selection from the user might be judicious to avoid a never-ending listing. Instead, the user will just need to type flights or cabs and all the corresponding objects will be printed.

To do so we can define the values literally or create a small function. Note the latter will return only the top level ones. To get all the nested keys another approach is required.

# Defining available categories
categories = ["cabs", "flights"]
def find_categories():
    # Opening JSON data
    with open("mydata.json") as json_file:
        json_loaded = json.load(json_file)

    # Defining the list of categories
    categories = []

    # Appending all available categories
    for key, value in json_loaded.items():
        categories.append(key.encode("utf-8"))

    # Removing any duplicates and sorting keys
    categories = list(set(categories))
    categories.sort()

    return categories

Then a prompt for the category with some value checks.

# Asking for a category
while True:
    cat = raw_input("Select a category: ")
    cat = str(cat).lower().strip()
    # Checking the category exists
    if cat not in categories:
        print "[Error] Category not found!"
        # Printing available categories
        print "Available categories:"
        for c in categories:
            print "  * " + c
    else:
        break

Now we can even try to add another functionality; delete several objects at the same time.

Dot the i’s and use multiple indexes

We could stop here. But we can also continue. As we saw, one of the main advantages of using indexes and filtering with categories is speed. We can increase that; indeed, instead of deleting one objects, we can modify and expand our code to remove several items at the same time.

To proceed, we need to modify our main while loop which is handling the index. By using more loops, several break statements are required. Because there is no nested break statement in Python, we first define a dumb function only dedicated to find if a specific index is in a given list. The argument indexes is a list of possible digital values and objects is a list of JSON items of a category. It returns -1 if the tested index is not within the list, 0 otherwise.

def check_inside(indexes, objects):
    for i in indexes:
        if 0 <= i < len(objects):
            continue
        else:
            return -1
    return 0

And we modify our process. First we define a list selection which is our arbitrary collection of indexes. We split all the inputs, remove useless spaces and perform additional checks and conversions. We sort our indexes collection and remove potential duplicates. PEBCAK, you know. Then we apply our dumb check_inside() function, sort again our list but this time in a reversed order to prevent against index out of range errors. Finally we apply the pop() to remove an object.

# Removing indexes
while True:
    index = raw_input("Select at least 1 index (add ',' for multiple indexes): ")
    selection = []              # Defining final selection
    indexes = index.split(',')  # Splitting all the indexes

    for i in indexes:
        i = i.strip()               # Removing white spaces
        if i.isdigit() == True:     # Checking it's a digit
            i = "".join(i.split())  # Removing any other spaces
            i = int(i)              # Converting into an integer
            selection.append(i)     # Appending to the final selection

        # Removing any duplicates and sorting indexes
        selection = list(set(selection))
        selection.sort()

    # Checking indexes are in the list
    instatus = check_inside(selection, json_loaded[cat])
    if instatus == 0:
        break
    else:
        print "[Error] An incorrect index has been found!"

# Removing objects with the final selection
selection.sort(reverse = True)  # Using reverse order to avoid out of range
for s in selection:
    json_loaded[cat].pop(s)

The output of this new functionality is:

Select a category: flights
2 -> Date: 2017-03-20, From: Seattle, To: London
1 -> Date: 2017-03-18, From: London, To: Seattle
0 -> Date: 2017-03-18, From: Paris, To: Seattle
Select at least 1 index (add ',' for multiple indexes): 0, 1, 2

On stage

The following lines will show the result before and after we apply our delete() function.

{
  "cabs": [
      {
        "date": "2017-03-18",
        "departure": "Seattle-Tacoma International Airport",
        "destination": "Flat"
      }
  ],
  "flights": [
      {
        "date": "2017-03-18",
        "departure": "Paris",
        "destination": "Seattle"
      },
      {
        "date": "2017-03-18",
        "departure": "London",
        "destination": "Seattle"
      },
      {
        "date": "2017-03-20",
		"departure": "Seattle",
		"destination": "London"
      }
  ]
}
{
  "cabs": [
      {
        "date": "2017-03-18",
        "departure": "Seattle-Tacoma International Airport",
        "destination": "Flat"
      }
  ],
  "flights": [
  ]
}

A good scrolling leads to a great terminal and a happy life

Only for (Linux/Unix + CLI lovers). I would like to provide a final trick to conclude this post. It is not related to JSON but can be handy in such situations. By using a lot the terminal, it happens sometimes I need to read a long file with it. I’m using rxvt-unicode for a couple of good reasons. One of them is the scrolling. By default, when you are reading a long file, your terminal will automatically scroll down at the bottom. It is possible to modify this behaviour with urxvt by changing the following parameters:

They are available in .Xresources, accessible by default in the home/ directory. If not, create one and use the xrdb ~/.Xresources command everytime a modification has been added. This thread might be helpful to modify the scrolling feature and this wiki on how to setup X resources. Also feel free to have a look on my dotfiles, especially the urxvt_term file.