Got something to say or just want fewer pesky ads? Join us... 😊

[News] Mass IT Outage



ozzygull

Well-known member
Oct 6, 2003
4,159
Reading
As some one who works in IT, when I see these outages on the news, the sense of relief felt knowing I had nothing to do with it, is massive. I just imagine the fallout within that company trying to find out who is ultimately to blame for the catastrophic error. I used to support software on for mobile telecoms operators in the early 2000's. the stress of upgrading their systems which one mistake could bring down their networks and affect millions of customers was massive. even though it had been tested to death and the implementation and documented role back plan was in place, I used to still feel sick until everything was back online and tested working.
 
Last edited:






BBassic

I changed this.
Jul 28, 2011
13,043
Imagine being the person who signed that one off for release. Knowing that you effectively shut down large chunks of the Western world, shutting down airports, access to healthcare, financial services.

Wow.
I sincerely hope that one singular person isn't blamed for this thing.

A f**k up of this magnitude should be looked at in terms of systems and process and testing methodology rather than "Dave, ya f**kin' eejit! Look at the mess you've made."
 




raymondo

Well-known member
Apr 26, 2017
7,336
Wiltshire
As some one who works in IT, when I see these outages on the news, the sense of relief felt knowing I had nothing to do with it, is massive. I just imagine the fallout within that company trying to find out who is ultimately to blame for the catastrophic error. I used to support software on for mobile telecoms operators in the early 2000's. the stress of upgrading their systems which one mistake could bring down their networks and affect millions of customers was massive. even though it had been tested to death and the implementation and documented role back plan was in place, I used to still feel sick until everything was back online and tested working.
And one can be as thorough as possible, but still not think of every possible scenario.
Yeah, ive felt your stress 😬, different business environment but still...
 






Dave the OAP

Well-known member
Jul 5, 2003
46,761
at home
This, totally agree. WTF is DR????
Disaster recovery. BC = business continuity.

remember the commercial union bombing in London back in the 70’s? It all came out of that where it became obvious that if companies had some sort of scheme whereby if a disaster occurs, they can relocate to somewhere else, restore the pre disaster backups in their systems, and carry on.

IRA bombs were designed to disrupt business, and the Bank of England made all banks and finance houses get DR contracts and test their systems once a year….that was my job!
 


dwayne

Well-known member
Jul 5, 2003
16,259
London
Disaster recovery. BC = business continuity.

remember the commercial union bombing in London back in the 70’s? It all came out of that where it became obvious that if companies had some sort of scheme whereby if a disaster occurs, they can relocate to somewhere else, restore the pre disaster backups in their systems, and carry on.

IRA bombs were designed to disrupt business, and the Bank of England made all banks and finance houses get DR contracts and test their systems once a year….that was my job!
Dave you still haven't told me how you mitigate the risk with something like this ;)

Crowdstrike just pushes the updates out willy nilly. Unpicking it in AWS has been complex for some systems I deal with. Has taken a rebuild, not fast. Also quite bespoke not something that can be sorted with additional AZs or regions.
 




US Seagull

Well-known member
Jul 17, 2003
4,637
Cleveland, OH
Interestingly, Crowdstrike CEO George Kurtz was once the CTO of McAfee. In fact, he was their CTO when this happened:


Call me old fashioned, but I think you only get to cause massive, global PC outages of critical infrastructure once.
 


Springal

Well-known member
Feb 12, 2005
24,779
GOSBTS
Crowdstrike just pushes the updates out willy nilly. Unpicking it in AWS has been complex for some systems I deal with.
that’s kind of why they’ve been so efficient for so long from my experience. Sadly with any software there is this kind of risk
 






WATFORD zero

Well-known member
NSC Patron
Jul 10, 2003
27,747
from US, in interview with Crowdstrike CEO he was asked why all the systems dont have backups. like thats his problem too.

I saw that. Looks a bit shell shocked. Don't envy his job today.

And that's why he gets paid the big bucks :wink:

I retired from IT 15 years ago, but one thing always rang true 'a junior person cannot make a massive cockup, you need to be senior to do that'. It's the truth but not always reflected in the outcome :down:
 


Dave the OAP

Well-known member
Jul 5, 2003
46,761
at home
Dave you still haven't told me how you mitigate the risk with something like this ;)

Crowdstrike just pushes the updates out willy nilly. Unpicking it in AWS has been complex for some systems I deal with. Has taken a rebuild, not fast. Also quite bespoke not something that can be sorted with additional AZs or regions.
To be fair the only way you mitigate this is to test the arse off any updates before they hit the systems. It is obvious in some companies that crowdstrike is the single point of failure ( anti virus software is always robust but throwing thousands of developers at it using their own take on risk is again a multiple point of failure.

when you have a malware software that in effect mimics AI by infecting itself into the very coding level of AWS that is a recipe for disaster and a ball ache for you and your AWS colleagues.

is there an answer, reign in these large software houses in India and the far east and instil in them a mantra of TEST TEST TEST not just GET IT OUT AS CHEAP AS YOU CAN.

you must have seen this first hand using AWS
 








Scappa

Well-known member
Jul 5, 2017
1,580
As some one who works in IT, when I see these outages on the news, the sense of relief felt knowing I had nothing to do with it, is massive. I just imagine the fallout within that company trying to find out who is ultimately to blame for the catastrophic error. I used to support software on for mobile telecoms operators in the early 2000's. the stress of upgrading their systems which one mistake could bring down their networks and affect millions of customers was massive. even though it had been tested to death and the implementation and documented role back plan was in place, I used to still feel sick until everything was back online and tested working.
Transpose that to a food safety setting - and the potential for something really dreadful to happen - and that resonates with the recent ecoli outbreak: sorrow and sympathy for those who were seriously effected, but a selfish, abject kind of relief that me or my team were in no way implicated
 


Iggle Piggle

Well-known member
Sep 3, 2010
5,948
I sincerely hope that one singular person isn't blamed for this thing.

A f**k up of this magnitude should be looked at in terms of systems and process and testing methodology rather than "Dave, ya f**kin' eejit! Look at the mess you've made."

As someone that has worked Corporate for 30 years in IT, that is exactly what will happen. An unpopular junior will take the fall along with a senior manager who had nothing to do with it who they wanted to get rid of anyway.

Some clueless bell will have a look at the processes and no doubt make things worse whilst claiming they've improved things.

The senior management will congratulate themselves on a job well done whilst pocketing massive bonuses.

Don't you just love capitalism?
 


Dave the OAP

Well-known member
Jul 5, 2003
46,761
at home
As someone that has worked Corporate for 30 years in IT, that is exactly what will happen. An unpopular junior will take the fall along with a senior manager who had nothing to do with it who they wanted to get rid of anyway.

Some clueless bell will have a look at the processes and no doubt make things worse whilst claiming they've improved things.

The senior management will congratulate themselves on a job well done whilst pocketing massive bonuses.

Don't you just love capitalism?
100% agree with this
 




beorhthelm

A. Virgo, Football Genius
Jul 21, 2003
36,013
To be fair the only way you mitigate this is to test the arse off any updates before they hit the systems.
there's the problem, looks like people dont test updates and just allow them to roll out upon issue. having some passing involvement in security policy i wager its because policy, usually with too little IT input, dictates this to ensure arse covering. i remember when sysadmins would have every release go through their test systems first before even hitting dev servers.
 


jcdenton08

Offended Liver Sausage
NSC Patron
Oct 17, 2008
14,483
What’s the latest on the ground? Everything fixed?
 


Albion and Premier League latest from Sky Sports


Top
Link Here