Operational resilience is described by Red Hat as ‘a system’s ability to resist losses and outages and to recover from them if they occur’. This is particularly important in the finance sector where an organisation’s ability to provide critical services could have widespread consequences on the stability and trust of the financial markets.
In a recent article, ‘A Technology Survival Guide for Resilience’ McKinsey stress the importance ICT maintenance plays in operational resilience, reminding organisations that ‘resilience needs to exist not only in the architecture and design but also through deployment and ongoing monitoring.’
DORA compliance and ICT maintenance
The Digital Operations Resilience Act, which comes into force on Jan 17th, 2025 for all financial entities which conduct activities within the EU, was designed to ensure all participants in the finance market, (including banks, insurance, and investment firms) are able to maintain operations when threatened by IT or cyber-security issues which may jeopardise the entire EU financial system.
Article 9 of DORA, (under ICT Risk Management: Protection and Prevention) states that financial entities shall:
- Minimise the impact of ICT risk on ICT systems through the deployment of appropriate procedures.
- Design procedures that aim to ensure the resilience, continuity and availability of ICT systems,
- (b) minimise the risk of corruption or loss of data, unauthorised access and technical flaws that may hinder business activity (c) prevent the lack of availability, the impairment of authenticity and integrity, or breaches of confidentiality and the loss of data.
In summary, financial organisations must demonstrate that they carry out routine ICT maintenance procedures to minimise risk and increase resiliency.
Patch management: essential to improving resilience and reducing risk
Patching, (which refers to the process of applying updates to software) is often said to be the foundation of cybersecurity.
Software is prone to vulnerabilities. These can be thought of as weaknesses which can cause errors or present opportunities for exploitation from cyber-criminals, resulting in compromised data and disruption to operations. Patches, (packets of code) must be applied to these vulnerabilities to ensure software maintains security and resilience.
When vendors release a security update, this alerts cyber-criminals to a particular software vulnerability who then seek unpatched copies of the software to exploit. It is therefore critical that organisations install security patches as quickly as possible to prevent software from being attacked.
To demonstrate measures have been put in place to reduce risk and increase operational resilience, organisations must establish a consistent patching schedule to ensure that software remains consistently protected against errors and threats.
Patching best practices
The diversity and complexity of software systems within businesses can make it difficult to stay up to date on available patches and which patches are needed. However, by following patching best practices, organisations can set up patch management processes that help them achieve DORA compliance and ensure their digital defences remain strong.
The patch management process should be broken down into the following stages: identifying, acquiring, testing, deploying and documenting. These stages are supported by the following steps:
- Identify. Create an inventory of devices, operating systems and applications
- Classify. Rank and group IT assets and patches by level of risk and critical status.
- Plan. Create a plan of which patches should be installed on which devices and who is responsible.
- Test. Patches can introduce bugs or cause performance issues. Testing patches in a lab or sandbox environment before deployment avoids potential problems.
- Validate. Confirm patches have been installed correctly.
- Document. Keep a record of known vulnerabilities, test results and all patch deployments.
Organisations should also demonstrate efforts made to identify and improve on weaknesses within their patch management process. This may include setting goals. Example goals may be reducing IT incidents which resulted from patching by improving testing or decreasing the time elapsed between patch availability and deployment.
Entrusting responsibility for patch management to expert third-party consultants is often a more practical solution. This approach frees up internal IT teams from what is often a complex and time-consuming process whilst and reaping the benefits of best practice patch management in security, resilience and compliance.
SABREX offer expert patching services which can also be contracted as part of a comprehensive Maintain Resilience Subscription, designed specifically to support organisations with attaining DORA Compliance.
Contact one of our experts today for more information.
A detailed knowledge of the lifecycles of your IT assets is another crucial aspect of ICT maintenance for maintaining security and avoiding business disruption through poor performance.
Plans should include the lifecycle status of all systems, purchases and subscriptions, plus information on upcoming lifecycle events across your environment.
Lifecycle planning improves operational resilience and therefore aids compliance by avoiding performance related disruptions. However, as businesses tend to have a considerable amount of assets whose status is also affected by any adjustments, repairs and upgrades, lifecycle planning can be difficult to manage.
Contracting high-quality IT consultants to perform regular lifecycle planning reduces the burden on internal IT teams whilst allowing the organisation to reap the compliance and business benefits.
SABREX offers thorough lifecycle planning services. In addition to improving security and resilience, the service also help organisations determine the most effective course of action for the replacement of assets. This, in turn, assists with IT budgeting and reducing costs by identifying where maintaining old assets is more expensive than replacement.
Contact us today for more information.
The role of testing in digital operations resilience
Testing enables enterprises to proactively identify errors, defects or failures before they negatively affect technology performance. It should therefore be a regular and integral part of an organisations strategy for maintaining operational resilience.
Testing for operational resilience should cover evaluations of an environment’s health and performance testing. Disaster recovery testing should also be regularly performed to assess the ability of systems to resume normal service following a catastrophic event. The necessity of this should be easy to understand in financial services, where significant outages in critical systems and applications can cost organisations billions.
Types & benefits of operational resilience testing
Below are the types of test which should form a regular part of an organisation’s strategy for operational resilience.
This involves a review of the entire application stack including application servers, database and software installations, to ensure they meet installation requirements, best-practices and maintain supported configurations. Potential issues, including misconfigurations, processing errors, data archiving, purge problems, and data growth are identified and investigated.
Health testing identifies bottlenecks which may negatively affect performance or cause downtime. In addition to improving operational resilience, it also helps to make business processes more efficient.
Performance testing measures the speed, scalability, responsiveness and stability of applications under their usual workload. Performance is key to providing a high-quality user experience. As today’s customers become more demanding, user experience is often a key differentiator amongst competing brands, making it more important than ever to discover whether software meets performance requirements.
Performance testing Improves operational resilience by reducing the risk of downtime. It also offers a variety of business benefits such as improving the user experience and improving scaling by checking how the application functions under a larger workload.
High-availability systems are designed to ensure server failures do not negatively impact on users. Testing focuses on high-availability mechanisms and appraises them against industry best practices. This enables potential weaknesses, errors or bugs to be identified and eradicated before they negatively effect availability.
Testing provides insight into a system’s capability for high-availability and improves operational resilience by preventing failures from occurring. It also allows organisations to demonstrate they have designed procedures to ensure the resilience, continuity and availability of ICT systems, a key aspect of achieving compliance with DORA regulations.
Disaster Recovery Assessment
Most organisations recognise the importance of creating disaster recovery plans to enable the business to quickly resume operations following a disruptive event, such as an IT failure or cyberattack. However, the effectiveness of these plans cannot be known without disaster recovery testing.
By simulating disruptive situations, disaster recovery testing establishes the actual capabilities of systems and operations to recover, helping to identify weaknesses and to improve the effectiveness of future DR planning.
Disaster recovery assessments improve operational resilience by proactively alerting your organisations to weaknesses which can then be remedied before an event occurs. The documentation from disaster recovery testing also helps achieve compliance by demonstrating your DR plans are evidence based. This information may also be used to build trust in the resilience of your organisation.
Discover how DORA may impact on your IT systems maintenance and remediation plans by booking a FREE SABREX DORA Consultation today.
Article by C. James