Mass upgrading Palo Alto firewalls

My company just bought 900 PA-200 firewalls.  Unfortunately, they all are pre-loaded with firmware version 5.0.6.  The current version is 7.0.1.  To get from 5.0.6 to 7.0.1, you must install a newer content version, then upgrade to version 6.0, then to 6.1, and finally to 7.0.1. Oh, and we want to install A/V as well, in preparation for shipping them to the stores.

They have a product called Panorama that manages their firewalls (can’t manage hundreds of firewalls without, if you ask me).  It can perform upgrades from one version to another, but isn’t smart enough to know what steps must be taken to get from 5.0.6 to 7.0.1.  Someone would need to know the process, and direct Panorama to do it, each step of the way.  Since I have 900 of them to upgrade, I needed to come up with a better way!  Waiting until they were at the store connected via a T1 circuit is not a good option either, as the content, A/V, and all the firmware upgrades would be over 1.1 GB in size.

A great feature for Panorama would be to have a “base” template you set for each Device Group.  That “base” template would include things like what Content and A/V versions, and what firmware for all the devices in the group.  Whenever devices are added to this device group, Panorama should automatically set them to the proper content, A/V, and firmware versions.

But, since Panorama isn’t that smart yet, the Palo Alto API and scripting magic to the rescue.

Since I’ve been writing a script to handle our installation process, I written a Palo Alto class to handle all the communications to the PA-200s and to Panorama.  I did have to add a few more routines to the Palo Alto class to handle everything that I needed, but it now works.

Our process works this way:
1.  A tech unpacks 10 PA-200 firewalls and attaches their Management port to a subnet on our corporate network.
2.  The tech scans the serial number bar codes on the back of the PA-200s, adding them to Panorama as “Managed Devices”.
3.  The tech adds them to the appropriate Template and a special device group that exists just for the upgrade process.
4.  The tech sets an IP address, Mask, and Gateway on each unit, pointing them to DNS servers and the Panorama server, then commits the change.  (This is a copy/paste process where the IP is different for each of the 10 units being upgraded.)
5. Finally, the tech performs a Commit in Panorama.
6.  The tech then gets back to other work, waiting for an email that will be sent once all the devices are upgraded.  This should happen about 1:35 to 1:45 minutes after the Panorama commit is done.

The real work gets done in a script that runs every 5 minutes.  This script:
1.  Gets a list of all the devices in the special device group.
2.  Attempts to create an object of my custom PA class for each device.  If it can’t communicate to it, that one is discarded for now, since this script will retry in a few minutes.
3.  Panorama is checked to make sure there are no active jobs for this serial number.  If so, it’s removed from further checks.
4.  Each firewall is checked to make sure there are no active jobs.  If so, it’s removed from further checks.
5.  The content version is checked for each PA-200.  If one isn’t found, it’s serial number is added to the Content queue and it’s removed from further checks.
6.  The anti-virus version is checked for each PA-200.  If one isn’t found, it’s serial number is added to the Anti-Virus queue and it’s removed from further checks.
7.  If the firmware starts with “5”, it’s serial number is added to the 6.0 upgrade queue and it’s removed from further checks.
8.  If the firmware starts with “6.0”, it’s serial number is added to the 6.1 upgrade queue and it’s removed from further checks.
9.  If the firmware starts with “6.1”, it’s serial number is added to the 7.0.1 upgrade queue and it’s removed from further checks.
10.  If 7.0.1 is installed, it sets the IP address back to the default and issues a commit.
11.  Finally, if 7.0.1 has been installed, and the box is unreachable (because the commit has taken effect), the device is removed from the special device group and moved to a Pending group.
12. All the various “queues” I mentioned get kicked off, with the serial numbers of the devices that need that step performed passed to Panorama via the XML API.  There’s additional logic to send emails when all the devices are out of the device group.

In practice, this is taking about 1:35 to fully upgrade 10 firewalls, though I suspect we could ramp this up to 20 or more, and it would likely take very close to the same time, since Panorama is upgrading all the devices in parallel.

This will have to do until Palo Alto upgrades Panorama to do it for me.

August 9, 2015 at 5:08 pm Leave a comment

Palo Alto and the power of an API

We recently bought Palo Alto PA-200 firewalls for our retail locations to replace our aging CheckPoint UTMs.  I didn’t investigate their API at all during the time we were looking at CheckPoint competitors.  I knew it had one, but hadn’t really given it a lot of thought.  Now that we have a massive roll-out ahead of us, I’ve started scripting parts of the process.  I must say that I love the flexibility that their API gives us.

In the past, for any major roll-out, I’ve scripted the process using telnet / SSH / HTTP (for web scraping), basically whatever interface the vendor allowed.  My goal is to make the installation fast and easy to support, while reducing the chance of human error as much as possible.  The hassle with CLI scripting for remote devices is always the parsing.  While it’s possible to do a good job parsing things manually, it’s time consuming and prone to error.  With an API, it’s faster and easier to code and you get data back in a predictable format.

If what you want to do can be done via SSH, Palo Alto has included a “Secret Decoder Ring” to help you figure out the API…  The secret is that the WebGUI and CLI both use the API whenever you do most anything.  So, in the CLI you can simply turn on “debug cli on”, and get most of the XML you need to pass to issue your API call by watching what the CLI does.  For example, if I do a “show jobs all”, I get this XML back:

<request cmd=”op” cookie=”8856737959639002″ uid=”500″><operations><show><jobs><all/></jobs></show></operations></request>

To do an API call to get the status of all your jobs, add in the blue and red portions from above appropriately:

http(s)://hostname/api/?type=op&cmd=<show><jobs><all/></jobs></show>&key=[Your API Key]

To reboot your firewall via the API:

http(s)://hostname/api/?type=op&cmd=<request><restart><system></system></restart></request>&key=[Your API Key]

Granted, there are some things I’ve not been able to figure out how to do via the API, like checking for the existence of an imported config file.  Via the CLI, just enter “show config saved ” and hit TAB after the last space.  The auto-complete feature of the PA CLI will show you a directory listing of saved config files.  If you do this with debugging turned on, you’ll note that you don’t see any “debug” info, so the autocomplete function must not use the API (or debugging autocomplete is disabled for readability purposes).

I expect that everything I need to do relative to the installation process can be handled via the API:

1. Import a pre-generated configuration file
2. Load the imported configuration file
3. Issue a local Commit
4. Check the status of the Commit
5. Read the Serial Number of the remote device being installed
6. In Panorama move the device from the “Pending” device group to the “Production” device group
7. Issue a Panorama commit for this device (by Serial Number)

If you have any need to programmatically interact with a Palo Alto firewall, I encourage you to dig into the API.  There’s tons of very good data, just waiting to be accessed.  Very easily.

 

July 23, 2015 at 7:33 pm Leave a comment

F5 GTM iRule to enforce Google Safe Search

There are ton’s of tools you can use to enable Google Safe Search…  Essentially, you need to serve a custom record for http://www.google.com that’s a CNAME pointing to forcesafesearch.google.com.

Anyhow, for our Customer Wifi, we want to take some steps to limit the visibility of adult results to our customers (both for liability and PR reasons).  Since we have a large number of retail locations, all running through a central data center, we run a high performance DNS cache using our F5.  While I’m sure there are lots of ways to solve this issue, we created an iRule to handle it:

when DNS_REQUEST {  
if { [DNS::question name] == "www.google.com" } {   
set lookup "[RESOLV::lookup @[RESOLVING DNS SERVER HERE] -a "forcesafesearch.google.com"]"
set ip [getfield $lookup " " 1]
DNS::answer insert "www.google.com. 300 IN CNAME forcesafesearch.google.com"
DNS::answer insert "forcesafesearch.google.com. 300 IN A $ip"
DNS::return
}
if { [DNS::question name] ends_with "explicit.bing.net" } {
DNS::answer clear
DNS::header rcode NXDOMAIN
DNS::return
}
} 

Just replace the text “[RESOLVING DNS SERVER HERE]” with the IP address of a server capable of resolving the forcesafesearch DNS query.  If you are using Route Domains, don’t forget to include it on the end of your DNS server IP.

As a bonus, this iRule also blocks explicit.bing.net, the domain that Bing uses to display thumbnails/videos for explicit content.

July 15, 2015 at 5:59 pm Leave a comment

Making administrative web apps in PHP

What do I mean by administrative web apps?  Basically, a web app that lets you update a series of database tables,

Let’s take my most recent one as an example.  We are about to deploy LTE capable routers (Referred to as Modems in some places in this blog entry) to our remote locations.  800+ of them.  Each remote site will have an LTE router with modem and two SIM cards.  One of those SIMs will be active and the other will be there so we could switch to it, in the event we have trouble with the primary SIM (Vendor A has trouble, Vendor B might work better).  Each Vendor requires certain data when activating a SIM.  One wants the modem IMEI, another wants the modem MEID.  They all want to know which SIM ID is involved.  Oh, and each SIM is associated with a static private IP address.

The easiest way to manage such a thing is probably via a database table that can be edited via a web interface.  If it’s just you doing it all, perhaps you can edit the tables “in the raw” using something like PhpMyAdmin.  But, if you have a team involved, you might want to dial back the control a bit.  This is where you’d want to built a website to manage these database tables intelligently.

Rule #1:  Use a Web Framework!

A couple years back I learned about Bootstrap, the framework that lets you easily create clean, professional looking websites.  Using one of the frameworks will make your web apps look clean and professional.  Just pick one, and stick with it.

Rule #2:  Authenticate!

Whenever you are creating any sort of administration web app, make sure authentication plays a part.  You want to be sure that the people using your web apps are supposed to be using it.  While this is a “duh” statement for anyone writing publicly available apps, it also holds very true if it’s a private app that’s only visible to company employees.  In my case, I had a rudimentary system in place for a very important page, but many others were wide open.  While revamping the system using Bootstrap, I created a simple radius-based authorization include file to add to all the pages I wanted to secure.  I used a radius class I found online, I think this one.  I actually love it that I’m not very familiar with the class, as I’ve had to do so little with it.  I pretty much dropped it in, and it’s been working great ever since.  Since it’s just a single “include” line, securing other pages is drop-dead simple.

Rule #3:  Use Editor

This is an amazingly smart library of code.  With it, you can build a modern AJAX enabled web interface to manage your database tables very easily.  You may have to pick up a little bit of Javascript knowledge, but the back-end is PHP, and if you are half decent at PHP, you should be all set.

They have example code galore.  The majority of what you probably want to know how to do is right there.

If what you want to do isn’t listed in the examples, just ask!  Support on their forum is very good.  Most of my problems seem to stem from a lack of knowledge on the Javascript side of things.

 

Ok, time to talk about the elephant… No, Editor isn’t free.  It’s $119 if you are a solo developer, and goes up from there depending on the size of your team.  But trust me, it is so worth it.  If you were to try to write your own class library to do all the things Editor does, you’d spend many, many hours doing it, making the price tag a bargain.

Rule #4: Make an Audit Trail!

Any time you build a web interface that allows users to edit database tables for anything important, you should include code to audit the database tables, so you’ll see exactly who made what changes.  If you’ve used an authentication include, as I suggest, you can probably grab the logged-in user so you can write that into the audit trail as well.  I’m not suggesting this so you can beat up on the guy who made the mistake.  It’s so that you can quickly look back, see what has changed, so you can quickly fix the mistake.  It also allows you to do a little remedial training with whoever made the mistake, so they won’t make it again.

Now, an audit trail when using Editor is a bit of a challenge.  Unfortunately, the PHP code for Editor doesn’t include a smart audit capability or anything similar.  If you are serious about this, though, you can find the driver file for the database type you are using and modify it to create your audit trail.  Heck, a timestamped log file including the username and the UPDATE, DELETE, and CREATE SQL queries is probably all you need.  I actually parsed the SQL and wrote it out in an audit table so the NOC team can look through it (in another tab in the web interface) and figure out what happened, but that’s probably just me.

Rule #5:  Consider the Work Flow!

If you are writing this for someone else to use, it may be tempting to quickly write it in the fastest way you can, and then move on.  Don’t do it.  In my case, my users will be using this to keep 800+ sites straight.  That’s a big job, so I’m trying to make it as easy as possible.

Consider the things the users will need to do with the application.  When adding a new modem to my database, they’ll first scan in the IMEI (using a barcode scanner), then the MEID, so those are the first two fields on the “Create Modem” page.  The next item is the location number that will get this modem, then they can select two SIMs, and then a Vendor dropdown to indicate which SIM is active.  Design with the workflow in mind.

Build in logic to keep errors from happening.  In my case, I’m repopulating the SIM dropdown lists to only include SIMs that have not been selected before, since a SIM can only be in one modem at a time.  Similarly, if the user has selected AT&T and Verizon SIMs, don’t let them select Sprint as the Active Vendor.

Handle “special things” in the web app.  In my case, I’m going to have some LTE Survey kits that I’m giving “fake” location numbers.  I’ve added logic to the tool to prevent records marked with “fake” location numbers from being edited by the user.

July 4, 2015 at 6:40 pm Leave a comment

CradlePoint and NTP

CradlePoint routers have one interesting habit that I’ve noticed.  If they can’t connect to the WAN, they won’t try to sync with the NTP server it is configured for.

On the surface, this seems smart.  Don’t bother trying to sync time unless you have a path to the Internet, right?

But what if you have a local time server?  In that light, this is a poor decision.

Perhaps a bit more complexity to their code would satisfy everyone.  If the NTP server that is configured is an IP address and it’s one that falls into RFC1918 (private IPs), then go ahead and try to sync as soon as you boot up.  If it’s a DNS name, or a public IP, then wait until you get a WAN connection.

Anyhow, I noticed this some time back, but didn’t think too much of it.  As it turns out, it can cause an issue in our particular use-case.

In my previous post, I outlined that we are using the API to perform speed tests at remote sites, using what I call an “LTE Survey Kit”, to gather data about all three of the major carriers at once at each remote location.  We noticed an anomaly in the speed test results.  Most of the time it works fine, but occasionally we get back a completed speed test with a speed of 0.00.  That’s right.  It told us that it FINISHED the test in the 60 seconds we provided, but the speed was calculated to be 0.00.

How can that be?  If it took the full 60 seconds to download 1 MB of data…  Well, let’s do some simple math:  1000 KB / 60 seconds = 16.67 KBps.

So, if it was as slow as possible, while still completing, I’d expect a result greater than 0.00.

A bit of testing today discovered that on the few occasions we see this anomalous result, the test started before an NTP sync was done, and finishing after it completed.  The speed was calculated as if it took from sometime during 1969 until present day 2015 to complete that speed test, hence the speed of 0.00 Kbps.

So, how to fix?  Well, we could simply remove the NTP server config so it doesn’t sync, but I thought perhaps adding a wait loop would be better.  So now, when this test runs, after the CBA850s come up on the individual carriers, I check to see if NTP is synced (another API call),  Once all the devices with active WAN links have synced with NTP, then we can start the test.  As a side benefit, this gives each of the devices a little more time on the carrier prior to kicking off a test.

Also, I found out today that if you query the API (/api/status/rf I believe) you get back an array of all the results of the SINR.  This is polled every 18 seconds by the CradlePoint and could be very useful in checking for stability.  Having the extra wait time for NTP sync means we can collect data a little longer.

July 2, 2015 at 8:01 am Leave a comment

Can’t trust Cell coverage maps? Make your own!

At my day job, I’m responsible for the network connectivity for hundreds of remote locations.  We use 3G today as a backup to our T1 circuits.  One thing most IT people who have tried to put in a cellular network can probably attest to:  Vendor provided cellular coverage maps absolutely suck.  Even if you give the vendors a list of your addresses, you’ll get a far rosier picture of the coverage they can provide than exists in this reality.

So, what do you do when the vendor provided data isn’t any good?  

You make your own!

 

We’ve taken a small plastic tub, a consumer grade 5 port Ethernet switch, three CradlePoint CBA850’s, and a Quirky power strip to handle the huge power bricks the CBA850s need.  We’ve mounted the power strip to the bottom of the tub,  the switch to the side of the tub on one end, and the three CBA850s to the sides, so that their antennas fold down just below the top.  The CBA850s are pre-wired to the Ethernet switch, along with an extra CAT5 cable about 10 feet long.  The CBAs and the switch are plugged into the power strip.  The CBA850s are configured for AT&T, Sprint, and Verizon, so we can get a good picture of the coverage of all three carriers.  They are configured with static IPs not used elsewhere in our locations.  When our technician arrives at a location, he removes the top of the case, folds up the antennas, finds a power outlet and a free Ethernet port and plugs in.  A quick call to our NOC to let them know which location he’s at and which port to turn up is all that’s left for the tech to do.  At that point, our NOC staff can kick off a script which updates the router config for that location to NAT the CBA850s to addresses specific to that store, allowing them to be reached from headquarters.

Then, using the API magic I mentioned in my last post, the script validates that all three CBA850s are reachable, then it checks in with them to see if the WAN is connected, waiting around 5 minutes for any stragglers.  Once they are all up, or the expiration time has passed, it kicks off a series of speed tests, both upload and download, gathering the results of the tests along with other diagnostic info (SINR, signal strength, etc).  Drop that data into a database table, and there’s our “map”.

That’s no MAP!  That’s just a bunch of numbers!

No, our “map” won’t look like a map, but it will have data telling us which of the three main cellular providers is the best at every one of our locations that we’ve tested.  From the perspective of our management, that’s really all that matters.

July 1, 2015 at 8:39 am Leave a comment

CradlePoint API info

Every CradlePoint router (with at least a reasonably recent firmware) includes a very nice API.

However, if you search looking for documentation on their website about it, you’ll only find information on the API for ECM, their central management service.

Here are a few very useful URLs that you can call with the Restful client of your choice:

Figure out what model of CradlePoint you’ve reached, and/or the serial number:
https://  [CradlePoint IP]/api/status/product_info/

{
“data”: {
“company_name”: “Cradlepoint, Inc.”,
“copyright”: “Cradlepoint, Inc. 2015”,
“mac0”: “REDACTED“,
“company_url”: “http://cradlepoint.com&#8221;,
“manufacturing”: {
“board_ID”: “050000”,
“mftr_date”: “20150401”,
“serial_num”: “REDACTED
},
“product_name”: “CBA850”
},
“success”: true
}

Get your firmware version (major.minor.patch):
https:// [CradlePoint IP]/api/status/fw_info
{
“data”: {
“build_date”: “Thu Feb 19 12: 00: 07 MST 2015”,
“manufacturing_upgrade”: false,
“major_version”: 5,
“custom_defaults”: false,
“minor_version”: 3,
“fw_update_available”: false,
“patch_version”: 4,
“upgrade_minor_version”: 0,
“build_version”: 13953,
“upgrade_major_version”: 0,
“upgrade_patch_version”: 0,
“build_type”: “RELEASE”
},
“success”: true
}

Find out if you’re connected:

https:// [CradlePoint IP]/api/status/wan/connection_state

{
“data”: “connected”,
“success”: true
}

Get your WAN interface IP:
https:// [CradlePoint IP]/api/status/wan/ipinfo
{
“data”: {
“netmask”: “255.255.255.248”,
“dns”: [
“10.10.10.10”,
“10.10.11.11”
],
“ip_address”: “172.16.24.27”,
“primary”: “lte-REDACTED“,
“gateway”: “172.16.24.25”
},
“success”: true
}

Too much good diag stuff to mention: 

Please note, I REDACTED most of the unique identifying info, but these fields are all available on your gear.  To get the portion of the URL that’s redacted, look in the “primary” key of the result of your WAN ip info, shown just above.

https:// [CradlePoint IP]/api/status/wan/devices/lte-REDACTED/diagnostics

{
“data”: {
“HM_PLMN”: “310410”,
“CELL_ID”: “176898562 (0xa8b4202)”,
“CARRID”: “AT&T”,
“CS”: “UP”,
“PIN_STATUS”: “READY”,
“GSN”: “REDACTED“,
“PRD”: “MC400LPE (SIM1)”,
“VER_PKG”: “05.05.16.02_ATT,005.010_002”,
“MDN”: “REDACTED“,
“MDL”: “MC400LPE (SIM1)”,
“TXCHANNEL”: “20576”,
“HOMECARRID”: “AT&T”,
“MODEMOPMODE”: “Online”,
“ROAM”: “1”,
“FW_CARRIER_LOAD”: “ATT”,
“VER”: “SWI9X15C_05.05.16.02 r21040 carmd-fwbuild1 2014/03/17 23:49:48”,
“CFGAPNMASK”: “65534”,
“MODEMPSSTATE”: “Attached”,
“RXCHANNEL”: “2576”,
“LTEBANDWIDTH”: “5 MHz”,
“VER_PREF_PKG”: “05.05.16.02_ATT,005.010_002”,
“RSRQ”: “-7”,
“RSRP”: “-90”,
“DBM”: “-69”,
“SCRAPN”: “16”,
“MDM_MODE_CAPABILITIES”: “55”,
“SS”: “100”,
“LAST_PIN”: “”,
“ICCID”: “REDACTED“,
“BANDULFRQ”: “824-849”,
“TX_LTE”: “-6.5”,
“RFBAND”: “Band 5”,
“SELAPN”: “1”,
“DISP_MEID”: “REDACTED“,
“SINR”: “21.2”,
“EMMSTATE”: “Registered”,
“VER_PRETTY”: “5.5.16.2”,
“CHIPSET”: “9X15C”,
“MODEMTEMP”: “40”,
“HW_VER”: “1.0”,
“PIN_RETRIES”: “3”,
“IS_LTE”: “true”,
“CGSN”: “REDACTED“,
“MFG_MDL”: “MC7354-CP”,
“MDM_CONTROL_TYPE”: “NORMAL”,
“MFG”: “CradlePoint Inc.”,
“PRLV”: “1”,
“LAST_PIN_VALID”: “False”,
“DISP_IMEI”: “REDACTED“,
“PRI_VER”: “05.03”,
“DEFAPN”: “1”,
“DORMANT”: “Dormant”,
“PUK_RETRIES”: “10”,
“DEFAPNTYPE”: “IP”,
“EMMSUBSTATE”: “Normal Service”,
“SIM_LOCK”: “FALSE”,
“SERDIS”: “LTE”,
“MODEMIMSSTATE”: “No service”,
“CUR_PLMN”: “310410”,
“BANDDLFRQ”: “869-894”,
“RFCHANNEL”: “2576”,
“MODEMSYSMODE”: “LTE”,
“IMSI”: “REDACTED“,
“EMMCOMMSTATE”: “RRC Idle”,
“MDM_DRIVER_CAPABILITIES”: “244785”,
“PRI_ID”: “9903437”
},
“success”: true
}

My favorite (so far) is a bit difficult to explain in this blog post, but I’ll try:

https:// [CradlePoint IP]/api/control/netperf

To use this, you need 5.4.0 or newer firmware, and you’ll really need your own NetPerf server, but if you get that set up, you should be able to initiate your own speed tests across the LTE link.  You’ll need to pass data to this one, though, so it’s a bit harder.  Here’s my data template, with words surrounded by percent signs as variables.

$json_template = ‘{“input”:{“options”:{“limit”:{“size”:%size%,”time”:%timeout%},”port”:””,”host”:”%host%”,”ifc_wan”:””,”recv”:%recv%,”send”:%send%,”tcp”:true,”udp”:false},”tests”:null},”run”:1}’;

After customizing this for the test that I want to perform, I do a HTTP PUT of this data.  In my case, with PHP, I have to pass my $json like this:  array(‘data’ => $json).

Anyhow, doing this kicks off a speedtest that runs for %timeout% seconds.  You can then to a GET to the /api/control/netperf URL and get a status, like so:

https:// [CradlePoint IP]/api/control/netperf

{
“data”: {
“input”: {
“tests”: null,
“options”: {
“udp”: false,
“limit”: {
“size”: 0,
“time”: 10
},
“tcp”: true,
“recv”: true,
“port”: null,
“send”: false
}
},
“output”: {
“results_path”: null,
“status”: “idle”,
“command”: null,
“error”: null,
“progress”: 0,
“guid”: -1
}
},
“success”: true
}

In the “output” section above, had I just performed a test, I could look at the value of “results_path”, which is a URL to the results of the test.

There is a TON of great info you can get from the CradlePoint API.  CradlePoint built their web interface off of the API, so pretty much anything you see in the web interface can be accessed via the API.  In fact, if you simply use a tool like HTTPwatch to look at the interaction between your web browser and the remote CradlePoint device, you’ll be able to learn how to do all this yourself.

 

June 30, 2015 at 8:29 pm Leave a comment

Older Posts


Categories

  • Blogroll

  • Feeds


    Follow

    Get every new post delivered to your Inbox.