Monday 4 March 2024

Tech Tribulations #2: There's a New System in Town


Time for another story from my tech history, this time I point my torch of memory at one of the first larger systems I worked on from start to finish (nearly 10 years).

I think I can say the product name of this particular system, it was called Phoenix and it wasn't a bad idea, to be honest, as with nearly every project I've ever worked on we needed more documentation and design; those to continuously neglected facetes of software engineering.  But other than that, the initial design stood up.

Before we dive into that though, we need to understand the reason Phoenix was thrown onto a whiteboard and born.  What was the direct predecessor?

Yes, there was one, this is not a bluesky thinking system, Phoenix was to directly replace a system which had itself been engineered over the prior two years and not really lived up to expectations.

I shan't name that system, but anyone who knows me and knows Phoenix may know the code name it was given.  It was a hybrid of two languages a C wrapper around the physical hardware bound with JNI to a java core, which itself used the ODBC driver to talk to a Marathon database backend.

The idea being that the transactional nature of the database allowed for the representation of perfect state, and in any error it was simply a case of rolling the transaction off and so we had RORI behaviour in a very complex system.

The problem?

It was SSSSLLLLOOOOWWWWWW.

The host machine was (and I think I'm right here) a Pentium II class machine (it may have been a Pentium III later) with 16MB of RAM (and I believe it went up to 64MB by the end).  Am I talking about early 2005 here, we were working with legacy machines in the field which were being refitted from a system itself all written in C but not well understood.

This new modular java, database, transactional system was envisioned to be the future!

The problem?  Still it was SLLOOOOOOWWWWWW......

I've said that twice now, so for effect lets skip forward to about 2006, two years in and this database java monstrosity is not doing well, it doesn't perform well, the team has literally stopped even trying to work in agile scrum manner; the lead developer didn't take well to critique amongst other issues (like it's slow, have I said that three times now?)

Happily however there were two separate development machines, the one on the old old system and a whole other on the java junk.  So it was in the C group I was called into a meeting with my manager, the manager of the whole department.

He had a punch bag doll in front of a whiteboard and asked me to close the door.  Furtively expecting my P45 (to be fired) he asked me to take a look at the board... he had asked two other engineers to take a look through the day, so he was canvassing for opinion.

In the dry wipe ink was a block diagram, eight boxes, a hardware module to talk to hardware, a module to account, a module to host the menu, a module to talk to the content program, and so on, one for logging, one for sending reports back to HQ... These simple systems.

Each would talk to one another with a message passing mechanism, which was a line from each of these eight boxes.  Little did I know then that this message passing system would be the biggest part of my debugging life for the next ten years; that however is another story.

Right there and then this new system seemed like a much better proposition.

The manager liked the feedback from the various folks he had to talk to, and it was decided upon.  He would work on the platform, starting up the system, securing it with a manifest and preventing unauthorized execution of random code and content.

I would be working on the main attract module, this is the main menu from which content is categorized, and launched.

The content would talk to the Game Interface module, which was owned by a guy called Darren.

The accounting module was handled by the previous lead on the java system, and later a guy called Conrad.  Conrad at first though would be doing some other systems.

And there was a smattering of other work, like a logger module, and a build stack and al-sorts.

Work began, but not before the lead on the then current system had to be told her baby was being canned.  She and her main ally were duly summoned, shown the block diagram and where they would be working.  She did not take it well.

As she left the managers office, he was heard to exclaim and his stapler threw across the space and nailed a window barrier, sticking into it with a vibrating twang.

She loved her system, she did not like Phoenix, not least as it's name was a direct reference to something rising from the ashes, her system being declared to have self-combusted in failure.

Monday 26 February 2024

Tech Tribulations #1 : Smartcard Release Drama

It has been a very long time since a story time, so I thought I'd go over one about a software system I wrote from the ground up to secure the service to a machine; so I worked for a company which sold a whole machine to the customer (or leased them) while ever the buyer had the machines they would run.

In late 2014 the higher management realized this was an untapped revenue stream, and much to the annoyance of the customers, it was decided that a system update would go out; which the customer had to take to get any new content; and in this update they would have to also have a smart card reader installed and a card inserted which would count down time until it ran out.

Metering essentially, but "metering" had a whole other meaning for this system already, so it was just called the "Smartcard" system.

Really it was a subsystem, bolted into the main runtime as a thread check, which would wake at intervals, query if there was a card reader on the USB at all, check it was the exact brand of card reader (because we wanted to limit the customer just being able to put any in, they had to buy our pack).

And then it would query the card and deduce credit/count down if we were beyond a day.

We tried a bunch of time spans, hours, minutes etc, but deducting was decided to be after we accumulated 24 hours of on time, every 5 minutes an encrypted file on the disk would be marked, after 24 hours worth of accumulations the deduct would happen.

We tested this for months, absolutely months and to be honest we thought it was really robust.

Until it actually went into the customers hands, we suddenly had a slew of calls and returns, folks unhappy that they were inserting the card, "testing their machines" and suddenly all the credit was gone, and they were asking for a new card all the time.

At first we could simply not explain this anomaly, we had the written information about the service calls, replicated what the folks were saying, it all checked out fine, we got our increments, we could inspect the encrypted file and see we were accumulating normally and deducting normally.

I worked on this for days on end, we had to test for real, we did all sorts of things, power drop tests, pulling the card tests, all sorts.

The machine checked out, end of.

What had we missed?  What are the customers doing?  Testing the machine, okay, what are they testing?  The content, how does the content work for this update?  Well, seems that the customers didn't trust their testers or engineers, so what was happening was instead of testing for real they were doing what we called "Open door testing".

You see, when you close the door you accumulate and deduct, the machine is in operation normally as any user their end would have it operate....

Door open mode however, was intended to be used by service engineers, when the machine was deployed; so it is still in operation, the machine is in the field, but the door is briefly open to check things.

But these customers didn't trust their engineers in their warehouse, so they were not giving them credit to check the machine properly, they therefore tested in open mode... for days....

They accumulated massive operation debt with the machines in door open mode for days.

The moment they turned them off, happy they were working, and shipped them to sites they'd arrive on side immediately be turned on finally after so long in proper door closed operation and they'd instantly deduct the massive debt the warehouse team has accrued.

This was intentional.... But their use of the door open mode was an abuse, and one we had not even thought about.  We didn't even clock how long a machine sat in door open or door closed mode, worse still when in door open mode and test things on the machine ran at an accelerated update rate, we ticked over 10x faster to allow faster testing... The result was in just 3 days of warehouse door open mode testing they could accrue 30 days of operational debt.

That was a fault, one I could tackle with the team.  But changing the user habit of leaving the door open was harder...

We had to work with the user, and their patterns, we suspended the system for a short while and issued a new update, but the first customer taste of this "pay as you go" approach was a sour one.

Then things got bad....

Yes, you might think they were already bad, but they got worse.

A month later, all the above was resolved, and we thought things were settling down... Until we suddenly all hell broke loose.

EVERY MACHINE WAS LOCKED.

There were dozens of reports of their just not working, they had done their daily reboot and all of them reported a security fail on the smartcard....

All hands on deck, are our machines in the test pool doing the same?  Nope.

Is there something special going on?  Have clocks changed, is it a leap year, has the sky fallen on Chicken Little?

We honestly had no idea, there was no repeat in any of our test pool, no repeat on our personal engineering rigs, there was essentially no reason for this failure.

The only answer in such a situation is to observe or return one of the machines exhibiting the problem.

A lorry was sent and a machine brought back, under the explicit instruction not to open it nor change it, and the customer was not to keep their smartcard (it was theirs, but we would credit them a whole new card for the inconvenience).

Several hours spent staring at the code, and running checks by lowering cards so they would expire, or pulling the reader out and inserting it again we had no answer.

Before that arrives back with us however lets just think about the "smartcards" we used in our daily lives; our bank cards, they go into a machine we enter our pin and we remove them again.  Then how about cards like your gas meter, or you go see your GP, they have a card you insert into a machine and it stays there all the time to validate they are the GP, or they keep your meter in operation, if you have Sky TV and a viewing card; same thing, it is always in the device.

These machines are the latter kind.... Those cards are rated to have power on to them for long periods of time, as a consequence they cost more money than a card you only insert transiently...

And this company I worked for had very canny buyers, too canny.  Because they spotted a smartcard which used the same protocol... but was significantly less money to buy!

The difference?  You guessed it, it was the transient use variant.

The broken machine arrived, we powered it on, fail.  We open the door, remove the smartcard and sure enough on the rear of the plastic behind the chip the plastic is brown, burned.

The card can not be electrically trusted!

We highlight this and send it back to the buying department, they fouled up, they changed the hardware after we certified it, essentially sending an uncertified machine out.

A huge issue ensued about this, as this wasn't well understood that we had been provided and advised one card type into the update set, but of course the buyers would not accept it wasn't the same until we literally had the specifications of the card side by side we could see a digit difference in the part number and looked up the datasheet where clearly it said that the transient card was only rated to remain in a machine for 10 minutes.  More than enough for an ATM.  But a "security gating" card, as we wanted, they are rated to be inserted continually for 36 months.

Monday 8 January 2024

The 2023 Great Laptop Shop...

It is that time of life again.... Time to shop for a new Laptop... And I am finding a new scourge in the mix, that is "Soldered".

I am so absolutely sick of seeing "Soldered" next to the RAM specifications, like I get it for budget machines, I get it for tiny integrations, but for actual working laptops?  For laptops marketed as "Business" or as "performance"... oh my word, why are they doing this?

I really hate this whole phase.  I have hated the laptop hidden gotcha's for years to be honest, and I'm upset Lenovo have joined the price crunch unupgradable ewaste train.

My current machine being a Lenovo E480, I can and have upgraded the RAM, it has two drive slots (one SATA and one nVME), you can change the battery and it has plenty of USB and aux IO ports all around.  The issues I have with it is that one RAM Slot has gone bad; I simply doubled the RAM in the one remaining slot - but really I'm not at end of life.  And the CPU being an 8th gen is struggling with my workloads.

My options are to soldier on with the E480, maybe even move all my work into remote virtual machines and use this simply as a terminal into them; but then I would always require network access and that's not always possible.

Or get something new and locked down.

Or a Framework offering, which lets me upgrade to a point.

Or keep shopping... But the ease of finding what I want is very much on the low-side, and I'm a technical user, finding what I want should be simple.  But the deck seems stacked with both retailers and system integrators wanting to pass off this shovel ware crap on me.

Monday 23 October 2023

Job Interview : switch(char) again! Twenty years on!

I've spoken about some of the strange interviews I've enjoyed over the years; notable engineering highlights were having test driven development spooned to me, when I simply didn't know the particular syntax they were demanding, or having StringBuilder suggested for a C# task where they actually needed to cross-core thread their problem, or pointing out that the CI solution being described would lead to an alien environment of folks being paralysed by fear of red flags; only to witness their development floor being paralysed by exactly those issues as I walked out the door.

Today however, I'm going to go way back and talk about one of the very early development interviews I had; this maybe like the third interview I ever had after graduating, it was with a rail rolling stock company, in a very bright and sunny room with three guys who seemed to know their entire universe was correct and they'd take no new information into their sphere of understanding.

I share this one as a friend of mine was in touch over the weekend, he had an interview, and strangely he had the exact same conversation I had... THE EXACT CONVERSATION, twenty something years later.

Let us start by sharing some perfectly valid C++ code:

bool validate(const char& value)
{
switch (value)
{
case 0: return true;
case 'x': return true;
default: return false;
}
return false;
}
int main()
{
const bool isValidOne{ validate('a') };
const bool isValidTwo{ validate('x') };
const bool isValidMe{ validate(12) };
}

Read this code, we should see an output (if we output the bools) of "false", "true" and "false"... I think we can all agree.  This is perfectly valid C++.

It isn't very clean nor pretty, we're relying on the compiler standard decaying char to int and vice versa; which is not very nice and I wish compilers would complain here, that we're comparing int to char in the first case and we're converting the integer 12 to char, I think it should really strong type this.

But that is not the point, the point is that both in my friends case and mine all those years ago the interviewers got really uppity about that "switch" statement.  They insisted that you can not "switch" on a character type...

Just flat insisted.

I remember in my experience they had me write up what they wanted on a blackboard, with chalk, and they wanted me to go through the code line by line; I wrote more or less the above, they insisted they had a character not a byte (uint8_t: though there was no standard for this at the time) and then they went off on this tangent amongst themselves about the value of the if-elseif-else stanza's they insisted would work over my solution above.

Now I was talking to these guys in the early 00's, my friend however was talking to them in 2023.  He jumped on this with the point of "well if the character is known at compile time, yes then constexpr maybe of use" and he showed them:

if constexpr(foo) and if consteval(bar) examples.

He described their faces screwing up and two of the chaps he was talking to staring at each other before making subtle notes on their paper.

It was very clear at this point that the CTO performing the interview was not keeping his knowledge up to date, the group seemed to cluster onto the idea of just maintaining what they have, stagnation like this in our joint experience can be problematic.  Both team morale and high turn over of engineers always happens when things stagnate.

I have to say in my interview I actually sat it out from beginning to end and just knew I was not interested in the role; and I did not hear from them again.

My friend however, he was genuinely concerned he'd done a good interview and they'd be offering him a role, they kept him talking for over an hour and a half!

He's a late 40's engineer of 25+ years experience, just like me; the three chaps interviewing him, including the CTO, were ten or more years his junior, it seemed they were looking to back up their team with an older head to lend legitimacy to things; and they had lost two original engineers of their project to literal retirement without replacing them.

He saw the issues and he thanked them for his time and left, with a negative to whether he would be interested in going further.  This stunned them.

It has to be said, so many interviewers forget that though they are interviewing you, you are also interviewing them like most all human interaction it is a two way street and it's crucially important to remember that in a job interview.  We spent a significant amount of our time at work, if you are not comfortable there or feel you are going to be acting out a role instead of filling one and making it your own.

Thursday 14 September 2023

C++: I was today years old when I learned this about shared_ptr

I was on a review for a colleague and there was this piece of code which stood out to me, I could not figure out what was being achieved.  Without being specific it was essentially

static std::unique_ptr<B> Create(const std::shared_ptr<A>& referenceToA);

Just the function prototype didn't sit right with me and the use case was even more strange to my eye for the "B" structure here being created has the shared pointer in the members.

struct B
{
    B(const std::shared_ptr<A>& referenceToA)
        : mMyA(referenceToA)
    {
    }

public:
   static std::unique_ptr<B> Create(const std::shared_ptr<A>& referenceToA)
    {
        return std::unique_ptr<B>(new B(referenceToA));
    }
};

private:
    std::shared_ptr<A> mMyA;

And my head just lot it.  To my eye this API, the create function here, is given a constant reference to the shared_ptr and what can be constant about a shared_ptr?  Well the internal reference count, or so I thought.  I believed this was a mistake, I believed the compiler would throw this one out and say "No, you can't change the reference count of this const object".

Never had I ever thought that the shared_ptr is actually referencing some other controlling block elsewhere.  My understanding was therefore flawed and I'm happy to admit naive.

So what would happen here?  Well, the B constructor, actually copies the shared_ptr control block, it therefore does and can increment the reference counter to the shared_ptr.

As counter intuitive as the const is therefore.  We aren't actually saying the shared_ptr itself is constant, rather the reference to it is, we should be reading the parameter type as std::shared_ptr<A>const &  referenceToA.

I felt rather silly for not realising this earlier, not least as I did once write a C program to switch out the qualifiers on a function call to make things like this stand out in formatting my code!!

But it has slipped my mind! Okay! I'm old, shut up.

Here is the full code of what I put together to understand this: 

#include <memory>
#include <mutex>
#include <queue>
#include <string>
#include <cstdio>
struct A
{
A()
:
mIndex(GetNextIndex())
{
}
~A() = default;

const int mIndex;
private:
static int GetNextIndex()
{
static int index{ 0 };
return ++index;
}
};
struct B
{
public:
B(const std::shared_ptr<A>& parent)
: mParent(parent)
{
}
std::shared_ptr<A> mParent;
static std::unique_ptr<B> Create(std::shared_ptr<A>const &  referenceToA)
{
return std::make_unique<B>(referenceToA);
}
~B()
{
mParent.reset();
}
const int& GetParentIndex() const { return mParent->mIndex; }
};
int main()
{
std::shared_ptr<A> original{ std::make_shared<A>() };
printf("Original %i: %i\n", original->mIndex, original.use_count());
auto createdB{ B::Create(original) };
printf("Copy %i: %i\n", createdB->GetParentIndex(), createdB->mParent.use_count());
printf("Original %i: %i\n", original->mIndex, original.use_count());
}

However, I have to say, I still don't like this; clever as it is, there's a hidden copy going on, in the B constructor the copy of the shared_ptr control block and it's reference count incrementing from 1 to 2.

It does quite neatly move ownership of the pointer, but the const and the reference just make my brain spin.  So where this can only gain 10/10 plaudits for C++ smarts, it only gets a mere 3/10 on the maintainability ladder for me, and only when we have the const& together, when written as at the top of the page this drops to 1/10.

Performance is also a factor here, if we wanted better performance we would have to consider whom owns the original A here.  If no-one and it is always added into the shared member in B, well move it... Create once and move it, this code doesn't communicate that as an option, but it certainly could be.



Wednesday 30 August 2023

tBs - My First Clan

Today I want to regale you with a story of my past gaming glory, before Eve... before WoW... Back in the days of 2002 I joined a clan.... Now, I've read Ready Player One over and over, I do not get this whole "I don't clan" ethos in that writing because back then being in a Clan was the thing.

Guilds and corps in my later gaming life are very different things, this clan was cut throat.  We were

{tBs}

The Butchers

This is where I met my long time co-killer Chaplain (wave, hi chap!) for we were together the scourge of Avalanche.


This is in the Half Life engine mod "Day of Defeat", and it didn't look this good (this is from the later Source version); all important in this game play was the tick rate of the server, most serious clans hosted their own servers and they tweaked them to the max.

I remember our server admin was a chap called Twister, but that's about all, he never much liked any input about the status of the server.  He didn't play very often, so it was always quite curious that us experienced players were not welcomed into giving him feed back; let me tell you as a now professional game developer and server administrator listen to your expert users, they will know when things are off, they may speculate, but they will know and help you spot issues up front they're a resource, use them, never dismiss them.


Anyway tBs were well known, playing on the European Enemy Down ladder we were very often top 10 in Europe, one time top three, playing on again off again matches against the other members of the ladder; we were effectively the second most active clan behind "Scotlands Finest Highlanders" or SFH.

At one time I myself organised our own "Butchers League" and for four weeks in the summer of 2003 I ran the website and helped organise five guest teams to play pre-arranged matches in a knock-out for the win.  SFH won.  Yes, SFH won the tBs League.  That was fine.

What wasn't fine was the reaction of another clan OAP or "Old Aged Players" they kicked off big time, they could not connect their players to the server; they didn't do any preparation up front and get really really angry, really for no reason, when they were disqualified from a match they simply could not raise five players to compete in.  They were offered a different match slot, but they even turned up to that with only four players and a bad attitude.

Within tBs ourselves there were a bunch of interesting characters, but lets talk more about the game.

Day of Defeat

I loved that game, or mod, my favourite map was Avalanche, a wholly unfair map for the allied side (and yes, I admit I enjoyed playing the German side more) but Chaplain and I were demons of the Church.

The Church, pictured above sat on the flank of the map, entered via the smashed bell tower at the top or the rubble strewn wall at the bottom, the actual doors were closed.  Inside you had two side by side, but slightly offset rooms on the ground, then rubble streaming up to a mezzanine level, with then either the option to climb a ladder to the exposed but dominating tower top, or to carry on up and through a wall to the tiled rooves beyond.

As an axis player you began from the top side, coming into the church via the tower.

As an allied player you began on the ground and were forced up through the rubble.

Defending that tunnel route was critical to holding the map, you controlled access to the opposite side for the allied players, you pinned them in their spawn.

Or you held the tower and dominated the center.

It was probably not well thought-out in design terms, but it was such a wonderful piece of level spacing I loved it.

My best memory came playing another clan, we as axis and they as allies.  For some ungodly reason they had a machine gunner run right up into my door way... and it was my door way, I had gone from spawn, swept over the roof into the church and down in sprint fashion time.  This guy thought he could take my church and lay MG fire on my team from my church tower?!  Nope.  A burst of MP40 and he was down, but do was his MG.

Now it was not often ever we took the MG to the church, it wasn't practical in the close-quarters environment SMG was the usual call to arms.  But this day I could not resist, I lay in the rubble pounding their own MG fire down on them.  Their only hope to remove me was to pre-cook a grenade and time it perfectly to explode in my face.  They did not do this; my score was astronomical with chaplain bringing me a reload of ammo and they throwing himself into fire to respawn and bring me yet more.

That was our church on Avalanche.

A few other maps floated our boat, but were were just so good at that map, specialist even that when we played on open public servers we were often accused of cheating, being so good someone is convinced you are cheating is just the ultimate thrill.  I've only managed it through good strong team work.

The other tBs characters though:

Weeman - constantly threatening to self-harm, a disturbed kid to say the least.
Hopper - Cheating chap, he blanked chaplain and I after botting in WoW (and obviously so) undermining any respect we had for this "skills".
Remus - an older guy, interesting fellow to talk to.
Twister - the hard put server admin, who didn't want to know really about our issues.
Dodi - a nice lad who came and went.
Mako - Not the shark he thought he was.
and of course "The Butcher" himself, Butcher being his surname... Always in charge, but never there.

I learned a lot from this, my first clan, I learned about team work, about trusting others and appointing them correctly and appropriately, all from a computer game, backed up with my martial experience in karate I believe I'm quite rounded in letting people both prove and earn their reward and garnering them with praise for a job well done; whilst also being even handed.

All experience for a computer game, and they call is a game; well, isn't life just one big game too?


Monday 7 August 2023

Amber Valley Planning aren't very good

I think I've come to the conclusion Amber Valley Planning Department are either incompetents' or simply idiots.  Having had to interact with them myself and now in relation to a nearby development, they've demonstrated incompetance which can't easily be explained... Let me explain.

You see a development was proposed, ironically the owners only sought planning permission after the fact, fine whatever get on with it...

So the ground-works and building are complete?... Yes.

The official visited the site?... Yes, there was a woman with a clipboard.

Does the block plan represent the reality on the ground?... No, not even close.

Any school child with a ruler and pencil would be able to very quickly look at two reference points on the ground, extrapolate two lines then compare them to the building work which took place.  They don't match, not even close.

Further more the building work significantly changed the levels of the area in question.  This is not mentioned on the planning permission request at all.

And the newly raised area is several hundred tonnes being held back by concrete posts... No no structural engineering posts, we're talking 4 inch garden posts, with postcrete and concrete weather board.

In the July & August ran we've had they're already bowing ominously.

And I find these two points extremely curious, for you see we were the ire of this department having skimmed 2 inches of soil and spread road stone we had to account for a "Change of level"... yet someone else can dump several hundred weight and build up a new embankment to around 6 feel of altitude?

We presented drawing plans from overhead shots, real shots from Google Earth, showing the true scale and location; it being a photograph after all.  We had to go back and be exacting, that was the order "be exacting" in the block out diagram shown.... Yet someone else can be approximately six feet off level of reality in two axis?

When this goes over, and it is likely to, could it be a danger to life?  Yep.  Will the planning department comment?  Will they heck as like, they just tick their box and waltz off, it is almost criminal in this case.

Either the planning are incompetents', and I strongly suggest they explain their decision; oh but wait, one can not challenge them, only the applicant can appeal?!?
I must therefore air on the side of caution and suggest they're merely idiots, bureaucratic machination maniacs of the paid civil servant ilk just there to tick the boxes and dot the lowercase J's, since the service... Clearly the service is broken.

Want to know the major comment about the works carried out?  That they must ensure the hedge remain for at least 5 years.  This is a hedge in which (at least further up) I found bottles tangled in the roots with a date of 1914 on them, so this is already very much an older hedge... so a) 5 years makes no sense, b) you ignored they changed the level, c) the works do not match the plans provided - and I told you of this! and finally d) they're already subsiding, dangerously so!

Incompetence demonstrated, well done Amber Valley Planning Department, well done.


I am extremely glad my hard earned council tax money no-longer goes to them, indeed I've spoken to my local duty planner, they were highly available very interested in my requirements and helped me out immensely; Amber Valley, not so much.