JSON

JSON (JavaScript Object Notation) is a data format that is designed for human readability and writeability.

Below is an example JSON file for college football teams from the 2022 season. I only included two conferences and three teams for space, but this helps illustrate some key advantages:

{
  "conferences": [
    {
      "abbreviation":  "ACC", 
      "id": 1,
      "name":  "Atlantic Coast Conference",
      "founded": 1953,
      "size": 14,
      "hasDivisions": true,
      "teams": [
        {
          "id": 1,
          "name": "University of Virginia",
          "abbreviation": "UVA",
          "mascot": "Cavaliers",
          "division": "ACC Coastal",
          "record": {
            "totalWins": 3,
            "totalLosses": 7,
            "confWins": 1,
            "confLosses": 6,
            "winPercent": 0.3
          }
        },
        {
          "id": 2,
          "name": "Virgina Tech",
          "abbreviation": "VT",
          "mascot": "Hokies",
          "division": "ACC Coastal",
          "record": {
            "totalWins": 3,
            "totalLosses": 8,
            "confWins": 1,
            "confLosses": 6,
            "winPercent": 0.272727272
          }
        }
      ]
    },
    {
      "abbreviation":  "Big XII",
      "id": 2,
      "name":  "Big 12 Conference",
      "founded": 1994,
      "size": 14,
      "hasDivisions": false,
      "teams": [
        {
          "id": 3,
          "name": "West Virginia University",
          "abbreviation": "WVU",
          "mascot": "Mountaineers",
          "record": {
            "totalWins": 5,
            "totalLosses": 7,
            "confWins": 3,
            "confLosses": 6,
            "winPercent": 0.416666667
          }
        }
      ]
    }
  ]
}

Benefits

The above structure allows for JSON to be:

  • Structured, but flexible: CSV files are easy to parse because they are structured. However, because columns are static in a CSV file, the columns semantics are tied to a column number, or single column name. Here, we can easily add data to the data (such as adding “city” to each team) without worrying about breaking any code that uses any existing labels.
  • Nesting of data: JSON nests data in a way CSV cannot easily do. This JSON to support structures that CSV cannot easily support, namely lists and maps (aka dictionaries). In fact, the way JSON represents lists and dictionaries as a data type matches JSON exactly (this is also true for Python).
  • Human readability: the data being in a human-readable format keeps the semantics of the data front and center when parsing the data.
  • Plays nicely with objects: the intent of JSON is that we can take objects, including their fields, and easily translate it to a textual representation of the data (the JSON String), and then translate the JSON String back into an Object such that the object because easy to transfer over networks/pipelines/sub-systems.

While we aren’t discussing XML right now, be aware the XML and JSON (and YAML) all support these features, and are often used in the same way to transmit data. JSON is, to the best of my reckoning, the most popular format right now (as of 2023). XML was the standard for much of the 2000s. But while the syntax of each file is a little different, the actual representation of the data is basically the same: organizing data into a nested structure of lists/maps.

Data types

JSON supports the following datatypes: 1) integers - whole numbers - 5 2) floats - decimal numbers - 3.5 3) strings - text bounded by "quotation marks" 4) booleans - true or false 5) lists - ordered data bounded by square brackets. For example, [8, 6, 7, 5, 3, 0, 9] 6) maps - key-value pairs bounded by curly braces where every key must be a string, but the value can be any datatype; For example: {"name": "Will McBurney", age:35}

You can, and almost always will, have things like a map having a list or map value, or your data being stored as a list of maps, etc.

Example structure

Specifically, the above json example is structured as follows:

The outermost structure is a map with one key: “conferences”. This is a map because of the use of curly braces: { }. The key “conferences maps to a list (denoted by square braces, [ ]) of conference object.

Each conference object is itself a map with the keys: 1) “abbreviation”: The conference’s abbreviation (string) 2) “id”: an ID number unique to each conference (integer) 3) “name”: the full name of the conference (string) 4) “founded”: the year the conference was founded (integer) 5) “size”: The current size of the conference (for football members, in this case) (integer) 6) “hasDivisions”: A boolean value denoting whether the conference has explicit divisions (boolean) 7) “teams”: A list of team objects in the conference

Example of a conference object:

{
  "abbreviation": "ACC",
  "id": 1,
  "name": "Atlantic Coast Conference",
  "founded": 1953,
  "size": 14,
  "hasDivisions": true,
  "teams": []
}

Each team object is a map which has the following keys: 1) “id” - a unique id for each team 2) “name” - the name of the team’s institution. 3) “abbreviation” - the common abbreviation of that team 4) “mascot” - the team’s nickname 5) “division” - the division that team is in (note that this may or may not be present depending on the conferences “hasDivisions” value) 6) “record” - an object (represented by map) that states that team’s win/loss record

Example of a team object

{
  "id": 2,
  "name": "Virgina Tech",
  "abbreviation": "VT",
  "mascot": "Hokies",
  "division": "ACC Coastal",
  "record": {
    "totalWins": 3,
    "totalLosses": 8,
    "confWins": 1,
    "confLosses": 6,
    "winPercent": 0.272727272
  }
}

Stepping through object

Let’s say I knew I wanted to get the University of Virginia’s conference wins and losses from the above data. For the sake of this example, we’ll say I know ahead of time that UVA is in the ACC. Here is how I would step through the JSON data (note that we will show how to do this with Java in the next unit, for now we’re sticking to a language agnostic approach).

The code snippets shown are for a general “python like” language, and aren’t meant to be viewed as literal code, but as psuedocode

1) Starting with the overall JSON Object, get the list from the key conferences: conferenceList = myJsonObject["conferences"] 2) From conferences, step through each conference one at a time until you find the conference that has ACC for abbreviation, and get that index (in this case, index 0): accConference = conferenceList[0] 3) Get the team list from the conference: accTeams = accConference["teams"] 4) Iterate through the list, getting the team whose name is “University of Virginia” (in this case, index 0): uvaTeam = accTeams[0] 5) Get the record object with the key “record”: uvaRecord = uvaTeam["record"] 6) Get the total wins from the record object with key “totalWins”: totalWins = uvaRecord["totalWins"] 7) Get the total losses in the same way: totalLosses = uvaRecord["totalLosses"]

Regardless what language you use, parsing JSON is done in this fashion, from the outside-in. Keep this in mind as we show Java code for parsing JSON in the next unit.


Previous submodule:
Next submodule: