Cisco IOS XE Catalyst 9000 Switches Upgrade using Ansible

In this blog post, we will discuss how to upgrade Cisco IOS XE on Catalyst 9000 Series Switches using Ansible, a well-known open-source automation tool. By taking advantage of Ansible's user-friendly playbook structure, we can make the upgrade process easier, reduce downtime, and maintain uniform IOS versions throughout our entire network.

This post will walk you through the steps to upgrade your Cisco IOS XE Catalyst 9000 switches smoothly and effectively, whether you are new to Ansible or already familiar with it. Let's get started and see how Ansible can improve your network management tasks.

Disclaimer [Proceed with Caution]

Please note that it is essential to test this playbook in a controlled test environment before implementing it in your production network. While I have thoroughly tested this playbook on my test Cisco Catalyst 9300 switch, it is crucial for you to ensure compatibility and verify the process within your specific network setup. Network environments may vary, and unforeseen issues can arise. By testing in a non-production environment first, you can identify and address any potential problems before applying the upgrade to your live network.

Prerequisites

Before diving into the upgrade process, it's essential to ensure that your environment is set up correctly. Here are some prerequisites to consider.

  1. SSH Access - Ensure that your Ansible node has proper SSH access to the Cisco Catalyst 9000 switches.
  2. The playbook uses 'Install' mode to perform the upgrade. If you are still using 'Bundle' mode, this will not work.
  3. FTP Server - Make sure that the desired IOS XE file is already uploaded to your FTP server. Ansible will fetch the file from there to perform the upgrade.
  4. SCP vs FTP - In my initial tests, I attempted to use SCP for transferring the IOS XE file. However, the transfer speed was significantly slow, taking around 10 minutes for a 50 MB file. Due to this performance issue, I decided to switch to FTP that only took around 10 minutes to transfer the entire 1G file.
  5. Upgrade Version - This example will demonstrate the upgrade process for Cisco IOS XE from version 17.06.03 to 17.06.05. The steps outlined in this blog post should be applicable to other version upgrades as well, but make sure to confirm compatibility for your specific use case.

Understanding the Manual Upgrade Process

Before diving into the automated upgrade using Ansible, it is crucial to have a solid understanding of the manual upgrade process for Cisco IOS XE on Catalyst 9000 Series Switches. Familiarity with the manual process helps ensure that you can troubleshoot issues, understand the steps being automated, and adapt the playbook as needed.

If you are not familiar with the manual upgrade procedure, automating the process might not be as beneficial or effective. For those who need guidance on performing the upgrade manually, please refer to my other blog post, which covers the step-by-step process for a manual upgrade of Cisco IOS XE on Catalyst 9000 Series Switches - https://www.packetswitch.co.uk/cisco-9300/

Once you are comfortable with the manual upgrade process, you can proceed with confidence in automating the procedure using Ansible.

Inventory and Group Variables

Before executing the playbook, it's crucial to define the inventory and group variables to establish proper communication with your switches and FTP server.

[switches]
test_switch ansible_host=10.1.12.10
---
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.ios.ios
ansible_become: yes
ansible_become_method: enable
ansible_user: admin
ansible_password: Cisco123
ansible_become_password: Cisco123
ftp_ip: 10.10.10.10
ftp_username: username
ftp_password: password

In the inventory file inventory.ini, we create a group called switches and list the target switch under this group. In this example, test_switch is the target device with an IP address of 10.1.12.10

In the group variables file group_vars/switches.yml, we define the necessary parameters for connecting to the Cisco switches using the network_cli connection plugin. We also specify the user credentials and enable passwords, as well as the FTP server's IP address and login credentials.

Make sure to replace the sample values in these files with the actual credentials and IP addresses relevant to your network environment. With the inventory and group variables properly configured, you can proceed to run the Ansible playbook for the upgrade process.

💡
Important Note: Storing plain-text passwords in your group variables, as shown in this example, is not recommended for security reasons. Instead, it's advisable to use a tool like Ansible Vault to encrypt sensitive data, such as passwords, ensuring that your network remains secure.

Playbook

---
- name: 9300 Upgrade
  hosts: switches
  gather_facts: no
  vars:
    new_version: 17.06.05
    new_file: "cat9k_iosxe.{{ new_version }}.SPA.bin"

  tasks:
    - name: Gather facts
      cisco.ios.ios_facts:
        gather_subset: hardware
      register: facts_output
    
    - name: print output
      debug:
        msg: "Current Version is {{ facts_output['ansible_facts']['ansible_net_version'] }}"
    
    - name: Save the Config
      cisco.ios.ios_command:
        commands: 
          - command: 'copy running-config startup-config'
            prompt: 'Destination filename \[startup-config\]\?'
            answer: "\r"
      register: save_config
    
    - name: Save config output
      debug:
        msg: "{{ save_config }}"
    
    - name: Start the Upgrade Process
      block:
      - name: Check for old files and remove them if found
        cisco.ios.ios_command:
          commands:
            - command: 'install remove inactive'
              prompt: 
                - 'Do you want to remove the above files\? \[y/n\]'
              answer:
                - 'y'
        register: install_remove_output
        vars:
          ansible_command_timeout: 600

      - name: Display install_remove output and result
        debug:
          msg: "{{ 'No old files found. Nothing to clean.' if 'SUCCESS: No extra package or provisioning files found on media' in install_remove_output.stdout[0] else 'Old files removed successfully.' }}"
      
      - name: Copy IOS image via FTP
        cisco.ios.ios_command:
          commands:
            - command: "copy ftp://{{ ftp_username }}:{{ ftp_password }}@{{ ftp_ip }}/{{ new_file }} flash:{{ new_file }} vrf Mgmt-vrf"
              prompt:
                - 'Destination filename [{{ new_file }}]?'
              answer:
                - "\r"
        vars:
          ansible_command_timeout: 900
      
      - name: Install and activate new IOS image
        cisco.ios.ios_command:
          commands:
            - command: 'install add file flash:{{ new_file }} activate commit'
              prompt: 
                - 'This operation may require a reload of the system. Do you want to proceed\? \[y/n\]'
              answer: 
                - 'y'
        register: install_activate_output
        vars:
          ansible_command_timeout: 1200
      
      - debug:
          var: install_activate_output.stdout_lines

      - name: Wait for switch to reboot and become reachable
        wait_for_connection:
          delay: 180
          sleep: 60
          timeout: 900
      
      - name: Gather new facts
        cisco.ios.ios_facts:
          gather_subset: hardware
        register: facts_output_new
      
      - name: New Version
        debug:
          msg: "New Version is {{ facts_output_new['ansible_facts']['ansible_net_version'] }}, Upgrade Successfully Completed"
        when: new_version == facts_output_new['ansible_facts']['ansible_net_version']
      
      when: new_version != facts_output['ansible_facts']['ansible_net_version']

The playbook targets the switches group defined in the inventory file. We disable the gathering of facts at the beginning, as we'll collect specific facts later in the tasks. The new IOS version and the corresponding file name are defined as variables.

The tasks in the playbook include:

  1. Gather facts Collect hardware information, such as the current IOS version, from the target switches.
  2. Save the running configuration Save the running configuration to the startup configuration.
  3. Start the upgrade process A block of tasks for the actual upgrade process, including:
    • Remove any old, inactive files.
    • Copy the new IOS image from the FTP server to the switch's flash storage.
    • Install and activate the new IOS image.
  4. Wait for the switch to reboot and become reachable Pause the playbook execution until the switch is back online.
  5. Gather new facts Collect hardware information again, including the new IOS version, after the upgrade.
  6. Verify the upgrade Confirm that the upgrade was successful by comparing the new IOS version to the expected version.

A few things to consider

Understanding Escape Characters

In this specific example, we use escape characters to handle the question mark ? and square brackets []

commands:
  - command: 'install remove inactive'
    prompt: 
      - 'Do you want to remove the above files\? \[y/n\]'
    answer:
      - 'y'

In the context of the Ansible playbook, escape characters ensure that the prompt is correctly identified and matched during the execution of the tasks. Without the escape characters, the parser might interpret the special characters as part of the control structure, leading to unexpected behaviour or errors.

Here's a breakdown of the escape characters used in the example:

  • The backslash \ is used as an escape character in this case.
  • The question mark ? is a special character often used in regular expressions. Escaping it (i.e., \?) ensures that it is treated as a literal question mark in the prompt string.
  • The square brackets [] are also special characters in regular expressions, typically used to define a set of characters. Escaping them (i.e., \[ and \]) ensures that they are treated as literal square brackets in the prompt string.

By using escape characters, we ensure that the playbook correctly interprets the prompt and proceeds with the expected behaviour during the upgrade process.

Waiting for the Switch to Reboot and Become Reachable

During the upgrade process, the switch will reboot, which causes a temporary loss of connectivity. To handle this, the playbook includes a task that waits for the switch to complete its reboot and become reachable again.

- name: Wait for switch to reboot and become reachable
  wait_for_connection:
    delay: 180
    sleep: 60
    timeout: 900

This task uses the wait_for_connection module, which is designed to pause the playbook execution until the target host is reachable again. The module periodically checks the connectivity to the host by attempting to establish an SSH connection.

The wait_for_connection module has the following parameters in this example:

  • delay - The number of seconds to wait before starting to check the connectivity. In this case, the task will wait for 180 seconds before it begins checking the connection.
  • sleep - The number of seconds to wait between each connectivity check. Here, the task will wait for 60 seconds between each check.
  • timeout - The total number of seconds to wait for the host to become reachable before considering the task as failed. In this example, the task will wait for 900 seconds (15 minutes) before timing out.

Conditional Execution of Upgrade Tasks

In the playbook, there's a conditional statement that controls the execution of the entire upgrade process block based on the current IOS version

when: new_version != facts_output['ansible_facts']['ansible_net_version']

This when keyword checks if the target switch's current IOS version is different from the new_version variable specified in the playbook (in this case, 17.06.05). If the condition is true, meaning the switch is not already running the desired version, the entire block of tasks within the upgrade process will be executed.

By adding this conditional statement, we ensure that the playbook does not unnecessarily execute the upgrade tasks if the switch is already running the desired IOS version. This can help save time, prevent potential issues, and reduce the risk of unintended consequences in the network.

Running the Playbook

As the playbook runs, you'll see output similar to the following.

PLAY [9300 Upgrade] **************************************************************************************************************************************************************

TASK [Gather facts] **************************************************************************************************************************************************************
[WARNING]: ansible-pylibssh not installed, falling back to paramiko
ok: [test_switch]

TASK [print output] **************************************************************************************************************************************************************
ok: [test_switch] => {
    "msg": "Current Version is 17.06.03"
}

TASK [Save the Config] ***********************************************************************************************************************************************************
ok: [test_switch]

TASK [Save config output] ********************************************************************************************************************************************************
ok: [test_switch] => {
    "msg": {
        "changed": false,
        "failed": false,
        "stdout": [
            "Destination filename [startup-config]? \nBuilding configuration...\n[OK]"
        ],
        "stdout_lines": [
            [
                "Destination filename [startup-config]? ",
                "Building configuration...",
                "[OK]"
            ]
        ]
    }
}

TASK [Check for old files and remove them if found] ******************************************************************************************************************************
ok: [test_switch]

TASK [Display install_remove output and result] **********************************************************************************************************************************
ok: [test_switch] => {
    "msg": "Old files removed successfully."
}

TASK [Copy IOS image via FTP] ****************************************************************************************************************************************************
ok: [test_switch]

TASK [Install and activate new IOS image] ****************************************************************************************************************************************
ok: [test_switch]

TASK [debug] *********************************************************************************************************************************************************************
ok: [test_switch] => {
    "install_activate_output.stdout_lines": [
        [
            "install_add_activate_commit: START Tue May  2 09:22:04 UTC 2023",
            "install_add_activate_commit: Adding PACKAGE",
            "install_add_activate_commit: Checking whether new add is allowed ....",
            "",
            "--- Starting initial file syncing ---",
            "Info: Finished copying flash:cat9k_iosxe.17.06.05.SPA.bin to the selected switch(es)",
            "Finished initial file syncing",
            "",
            "--- Starting Add ---",
            "Performing Add on all members",
            "  [1] Add package(s) on switch 1",
            "  [1] Finished Add on switch 1",
            "Checking status of Add on [1]",
            "Add: Passed on [1]",
            "Finished Add",
            "",
            "Image added. Version: 17.06.05.0.5797",
            "install_add_activate_commit: Activating PACKAGE",
            "Following packages shall be activated:",
            "/flash/cat9k-wlc.17.06.05.SPA.pkg",
            "/flash/cat9k-webui.17.06.05.SPA.pkg",
            "/flash/cat9k-srdriver.17.06.05.SPA.pkg",
            "/flash/cat9k-sipspa.17.06.05.SPA.pkg",
            "/flash/cat9k-sipbase.17.06.05.SPA.pkg",
            "/flash/cat9k-rpboot.17.06.05.SPA.pkg",
            "/flash/cat9k-rpbase.17.06.05.SPA.pkg",
            "/flash/cat9k-lni.17.06.05.SPA.pkg",
            "/flash/cat9k-guestshell.17.06.05.SPA.pkg",
            "/flash/cat9k-espbase.17.06.05.SPA.pkg",
            "/flash/cat9k-cc_srdriver.17.06.05.SPA.pkg",
            "",
            "This operation may require a reload of the system. Do you want to proceed? [y/n]y",
            "",
            "--- Starting Activate ---",
            "Performing Activate on all members",
            "  [1] Activate package(s) on switch 1",
            "    --- Starting list of software package changes ---",
            "    Old files list:",
            "      Modified cat9k-cc_srdriver.17.06.03.SPA.pkg",
            "      Modified cat9k-espbase.17.06.03.SPA.pkg",
            "      Modified cat9k-guestshell.17.06.03.SPA.pkg",
            "      Modified cat9k-lni.17.06.03.SPA.pkg",
            "      Modified cat9k-rpbase.17.06.03.SPA.pkg",
            "      Modified cat9k-rpboot.17.06.03.SPA.pkg",
            "      Modified cat9k-sipbase.17.06.03.SPA.pkg",
            "      Modified cat9k-sipspa.17.06.03.SPA.pkg",
            "      Modified cat9k-srdriver.17.06.03.SPA.pkg",
            "      Modified cat9k-webui.17.06.03.SPA.pkg",
            "      Modified cat9k-wlc.17.06.03.SPA.pkg",
            "    New files list:",
            "      Added cat9k-cc_srdriver.17.06.05.SPA.pkg",
            "      Added cat9k-espbase.17.06.05.SPA.pkg",
            "      Added cat9k-guestshell.17.06.05.SPA.pkg",
            "      Added cat9k-lni.17.06.05.SPA.pkg",
            "      Added cat9k-rpbase.17.06.05.SPA.pkg",
            "      Added cat9k-rpboot.17.06.05.SPA.pkg",
            "      Added cat9k-sipbase.17.06.05.SPA.pkg",
            "      Added cat9k-sipspa.17.06.05.SPA.pkg",
            "      Added cat9k-srdriver.17.06.05.SPA.pkg",
            "      Added cat9k-webui.17.06.05.SPA.pkg",
            "      Added cat9k-wlc.17.06.05.SPA.pkg",
            "    Finished list of software package changes",
            "  [1] Finished Activate on switch 1",
            "Checking status of Activate on [1]",
            "Activate: Passed on [1]",
            "Finished Activate",
            "",
            "--- Starting Commit ---",
            "Performing Commit on all members",
            "  [1] Commit package(s) on switch 1",
            "  [1] Finished Commit on switch 1",
            "Checking status of Commit on [1]",
            "Commit: Passed on [1]",
            "Finished Commit",
            "",
            "Send model notification for install_add_activate_commit before reload",
            "[1]: Performing Upgrade_Service",
            "300+0 records in",
            "300+0 records out",
            "307200 bytes (307 kB, 300 KiB) copied, 0.195365 s, 1.6 MB/s",
            "/usr/sbin/boot_verify_package: /ucode0/cat9k-select_srdriver.SPA.pkg: Digital Signature Verified",
            "/usr/sbin/boot_verify_package: updatepcr8d unavailable, KGV data not extended to PCR8 - No such file or directory.",
            "/usr/sbin/boot_verify_package: INFO: Collected KGV data for package cat9k-select_srdriver.SPA.pkg.",
            "  SUCCESS: Upgrade_Service finished",
            "Install will reload the system now!",
            "SUCCESS: install_add_activate_commit  Tue May  2 09:36:06 UTC 2023",
            "  PID TTY          TIME CMD"
        ]
    ]
}

TASK [Wait for switch to reboot and become reachable] ****************************************************************************************************************************
ok: [test_switch]

TASK [Gather new facts] **********************************************************************************************************************************************************
[WARNING]: ansible-pylibssh not installed, falling back to paramiko
ok: [test_switch]

TASK [New Version] ***************************************************************************************************************************************************************
ok: [test_switch] => {
    "msg": "New Version is 17.06.05, Upgrade Successfully Completed"
}

PLAY RECAP ***********************************************************************************************************************************************************************
test_switch                : ok=12   changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

This output displays the execution of each task in the playbook and their results. If the upgrade is successful, you will see a message similar to "New Version is 17.06.05, Upgrade Successfully Completed" at the end of the output.

Closing Up

In conclusion, this blog post has provided a detailed guide on upgrading Cisco IOS XE on Catalyst 9000 Series Switches using Ansible. We have discussed the prerequisites, inventory and group variables, and the structure of the playbook, covering the tasks and conditional execution. Additionally, we have explained the importance of escape characters, waiting for the switch to become reachable, and the need to be familiar with the manual upgrade process.