May 5, 2022 by Omna Solomon
Like any government-run technology infrastructure, Land Mobile Radio (LMR) systems should aim to deliver robust and reliable service anchored on well-established operations plans and maintenance structures. However, due to historically varied organizational structures as well as its unique standing as a mission critical service for first responders, operational processes and plans for LMR systems differ widely across the nation. While most publicly run technologies are currently under the purview of the jurisdiction’s Information Technology (IT) agency, it is not uncommon to find the LMR operations team in a variety of other agencies ranging from vehicle fleet shops, first responder agencies, emergency management agencies, 911 districts, separate agencies tasked with overseeing telecom systems, or possibly entirely outsourced to a local service shop. As a consequence of distributing LMR system operations over multiple agencies, LMR system management, operators and technical staff tend to adopt operational, maintenance and other procedures from several IT and non-IT agencies potentially resulting in a lack of robust and uniform best practices.
In recent years, as LMR systems are increasingly run by IT agencies (although law enforcement, firefighter, 9-1-1 centers, and emergency management agencies continue to operate LMR networks), there have been improvements in implementing uniform approaches to operating networks. However, IT agencies, too, struggle with operating LMR systems due to limited previous experience and because many of the other public technology systems they operate are Solution-as-a-Service subscriptions, or solutions with a large pool of experts (e.g., Microsoft applications or Cisco gear). LMR networks tend to be purpose-built, capital-intensive systems built on government-owned infrastructure and requiring niche expertise and have unique operations and maintenance (O&M) requirements that differ from typical IT requirements.
Despite these long-standing challenges, certain operations principles can and should apply to running LMR systems of any size in a streamlined, coordinated and predictable manner. Here, we provide a few general suggestions and overviews on process and procedure for radio operations teams. The degree to which these apply to a given operator may vary based on the network size, available budget or the agency’s risk tolerance levels. However, all operators should aim to incorporate or adequately address these elements which include, but are not limited to, staffing, sustainability plans, maintenance activities, incident management processes, change management, system monitoring, vendor oversight, budgeting and system documentation. Operations plans and processes should also aim to be exhaustive and address every aspect of the LMR system whether it is replacing/reprovisioning a radio site router, approving installation of a new bidirectional amplifier (BDA), setting up an air-to-ground communications channel, or auditing network security health.
- Personnel & Organizational Structure: It’s not uncommon for an LMR operator to leverage the expertise of in-house agency staff, personnel from multiple agencies, third-party vendors, or a combination thereof. Regardless of the source, each functional system element should have an identified individual and preferably an alternate with well-defined roles and responsibilities. Key personnel (think of your “super tech” who has no equal) can pose single points of failure and therefore should have capable support staff and alternates. If your staff size is too small, consider leveraging a vendor as an interim substitute. Understandably, staff for some rare activities like tower climbing may not be readily defined; however, there should be an established purchase order for obtaining such services rapidly. Finally, think about every aspect of your system and make sure there are qualified resources to maintain and respond to each subsystem. (Is there a backhaul router that hasn’t required maintenance or updates in so long that no one is able to access or troubleshoot it should it fail?)
- Incident Management Plans: A complete cycle for managing incidents from the origin, whether that is a customer service office or network operations center (NOC), to its resolution is essential for maintaining public safety grade service. The incident management plan should include flowcharts on how any issue is raised, triaged, diagnosed, resolved, and, importantly, that it outlines a process for notifying affected users. End users should also have an avenue for submitting system issues and requests for changes to the operator with appropriate escalation plans. The following Alarm Management Workflow diagram details an alarm response process.
- Change Management Plans: Although LMR systems are generally more stable compared to most modern IT systems, networks still undergo user- or vendor-driven changes that can impact the operator or user. Change management plans should address how a change is introduced, evaluated, approved and implemented. Specific changes, such as operating system updates, may not have a noticeable impact on the end user, but could still lead to brief outages necessitating advance courtesy notice. Other enhancements that introduce feature changes could require training for all impacted users. Changes such as radio or talkgroup additions, or a desire to encrypt transmissions, can stem from end users, and the change management plan must also address procedures for these types of changes.
- Sustainment and Lifecycle Plans: Any infrastructure element, whether it is an endpoint security software, a base station repeater, or a DC rectifier, should have a refresh plan per industry best practices and, importantly, the necessary funding secured. Regular and funded cycles for implementing upgrades have become increasingly common across most IT agencies, while most public safety LMR operators still have to justify and obtain funding for system upgrades on a discrete basis either as capital or operating expenses. Operators should aim to petition their executives and legislators for stable and recurring funding of LMR systems to cover the necessary recurring system refresh.
- Real Time Monitoring: Attaining public safety grade availability requires not only robust network design but also the operational and maintenance readiness to sustain that level of performance. Real-time monitoring of network health is an essential element of delivering high availability. Due to the capital cost of having a complete fault and alarm management system, many operators may forgo the hardware to facilitate remote monitoring or implement partial monitoring. Operators should aim to implement or subscribe to a real-time network monitoring service to promote rapid diagnosis and resolution of issue.
- Documentation: Accurate and comprehensive documentation is indispensable in maintaining any technology. Lack of good documentation routinely complicates maintenance activities, prolongs troubleshooting and fixing issues, and increases upgrade costs. Operators should require that their vendors provide clean and complete documentation on all radio systems and continue to update the documentation with ongoing system changes.
- Intra-jurisdiction Funding Sources: All government agencies are required to undergo an initial and annual budget process to secure network capital and operating expenses. Over the past two decades, many radio system operators have employed grants to fund public safety radio costs. While that is commendable, the availability of grant funding varies greatly from year to year, and jurisdictional financial and budgeting offices must respect that such a critical system as the public safety network requires substantial and regular funding. It is essential that jurisdictions leverage well-established measures for justifying and obtaining recurring funds for radio systems.
The discussion above provides only a few high-level suggestions intended to instill the necessary structure and disciple for operating costly and complex technology systems at public safety grade levels. Safeguarding LMR systems is no easy feat; having well-established, documented/written policies, standard operational procedures (SOP), and communications plans can ease the burden and better guarantee robust service for our first responders.
If your agency’s processes and procedures for maintaining your LMR system are in need of review or update, Televate can help you get back on the right track.