Thursday, 27 April 2017

Server Admin : Ubuntu 17.04 thinks it's Ubuntu 12.04???

Yeah, I'm serious, I've taken time tonight to look at the release of Ubuntu Server 17.04, specifically to set up a new mini-server which is to be Core 2 Duo powered and on 24/7 as boot strapper & service strapping server itself.

But, before I run I like to walk, so I set up a 2 core, 1GB RAM VMware machine from the 17.04 ISO... Take a look at the first thing it has presented to me....


Yes, just read that again... I booted the server... and the only action I took was to log in... Welcome to 17.04... All good...

What wait?.. Why am I being warned to upgrade my 12.04?  This is 17.04?

Before I ran around like my last vestiges of hair were on fire, I decided to do a simple test, I've previously found that Ubuntu often goes wondering off on the internet for message of the day (motd) information, so I pulled the network card (virtual) out of the machine.

This results in a long boot time, but you at least know no remote files or services are going to be listing things on your screen...


Five minutes later, I get to see what the system says... From experience I think Canonical have shown some news and up to 80 characters of 40 lines of news, I've never seen that much, but it has been a while since I looked at their motd scripts.

After logging in, I still got the message, however, a fresh install (without any network) didn't show the message, so I believe the install we see here cached something from an online source.

Anyway, taking a look in the /etc/update-motd.d folder, you can see a series of numbered scripts, these are so numbered to allow Canonical, or yourself, to add message of the day scripts, and keep them in the order you see them.


Checking "00-header" we see just the usual log in.

Then "10-help-text" is the three lines about documents, management and support.  I actually add "#" to the start of each of those lines to remove that files actions, I don't delete the file though, just in case.

The next line "50-motd-news", this looks to be the culprit... I'm not even going to look inside the file, because I can see the next file in the folder is "90-updates-available" and I can see in the login that the updates available happen after the message I want to be rid of....

So this strange, confusing, message is in "50-motd-news", I'm going to cut to the chase and kill that file.

And now my login is much neater, I have added a call to "ifconfig" into the 10-help-text, but my login is now clean of this strange message.  But I'm not impressed this has gone on, and I'm going to have to take a look through all these other motd scripts to see what and where my server is going off to.... Hmmm.

Monday, 24 April 2017

Tech Office: Talking Dress Code

Lets put this clearly, I don't believe in a set dress code (as in defining what anyone can wear) in technology, I've seen anecdotal evidence of the history of IBM where they insist on suits and sock garters and alsorts of things, I've seen companies demand you wear a suit - indeed I used to work in Formalwear production and had to wear a suit - I like suits, however, I don't believe they have a place in the everyday development office.

If you're seeing customers, if you're "forward facing" then sure, dress up.  But smart casual is enough for me, what is "Smart Casual"?

Jeans, sure.  Shirt, sure.  T-Shirt if it's plain, sure.  What's not... Well, how about a 30 year old sagging woolen jumper?  Please no, just no.

What about a bright orange flair neck shirt which has its top three buttons missing and is worn by a hairy chested 1970's throw back?  Please god no.

What about shoes?  Well, I wear a nice pair of Rockport leather shoes, they're functional, comfortable, ageing but decent looking... Trainers, I have no objection to trainers, sandles... No... No sandles.

Shorts?... Please No, just no.

Combat trousers, I'm find with them, so long as you're not putting ammo in the pockets, oh and they're a plain colour.

Then you come to the more sticky points, its kind of the dress code, and sure medical considerations aside, how do you smell?  I've worked with people, not directly in my team, who stink.  Bodily, as though they don't wash, but also as though they've been dipped in something... Chemicals, paints, cigarettes, bad breath, BO... It's just not acceptable, it's not polite, when you're working elbow to elbow or if you smell so bad people give you a wide birth, or close their mouth as they come past your desk, beware you will be asked to spray.

That said, what do you spray?  Please god, nothing totally stinking...

Thursday, 20 April 2017

Sys-Admin/Dev Ops : Assumption is Danger

As a systems admin, or dev ops, or whatever your job title might be, never ever assume that the person you're handing a system to has a clue.  This might seem harsh, but it's true, and proves itself true time and time again.

"Assumption is the mother of all f**k ups"

About a year ago I deployed a system which automatically sent requests to remote machines (via SMS) getting those machines to report their status or send back error information, but also to gather some basic information.

It has run happily for a whole year, it has been all pretty plain sailing, the hours and hours of work I put into it, to automate it and keep it self-sustained have paid off, zero faults, zero down time, self-regulation is the way forward for me; even if it took slightly longer to put the system in place, it has needed no human input for nearing a year!

However, the unit needed to move, about a week ago, it needed be physically picked up and taken out of my small server room and into the official server room, a dark cupboard basically controlled not by myself or my cohort, but the IT boffins.

Fine, I notified the customers, went off to the IT area, sorted out who I was to hand it to and physically delivered it to the chap, I watched him start to plug it all back together, power, wires, boot, fine....

I assumed he'd do this seamlessly....

Until this morning, well a morning last week, as I post these with a date in the future.  That morning was hell, I walked into a wall of customers not being able to get to their machines, the Easter weekend was looming, performance needed to be monitored, customer sites didn't have regular staff, explaining to temporary cover staff that system would be off was not a prospect I relished. 

To be frank, a lot of flapping going on, more than I expected... IT reported the system back online, but customers didn't stop flapping... Indeed, none of the estate seemed to be able to connect in... 1 hour, 2 hours, I've asked the boffins to check it time and again "It's fine", they tell me.

I look locally, I can't see the controller machine on the network, I can't see it through the remote management console... Where the hell is the machine?

I assure the customers I'll have answers within the hour, I hit social media with the same, this is going very public, and I'm rather annoyed as for a whole year things have run seamlessly; but been ignored, now its offline for a scheduled purpose and everyone is complaining, I do not want my success wiping away in a flood of negative press.

I call the IT boffins... "we'll look into it"... No, no no, you'll get onto it right now, not look, not glance, answers are needed.  Action from you is needed before my Re-Action goes nuclear.

I wait, five minutes, I was willing to give them ten.... My phone rings...

Them > "Hello?"...
Me > "Answers?"...
Them > "Yeah, you know when you brought it back?"...
Me > "The Machine?"....
Them > "Yes"...
Me > "I remember, why?"...
Them > "Well, it has power"...
Me > "Good"....
Them > "Not really"...
Me > "Why not?"...
Them > "Because that's all it has, it's not been plugged into the network"

I hung up.  They plugged it into the network, I had a slew of data come through... The customers were pacified.

I however was not.

I've had an on the spot review, firstly the IT bod who did this was held to account, second I was held to account for not noticing.

In not noticing I admit that having had it run cleanly for a year I had turned off the performance reports and I admitted I had assumed a network machine being handed to an IT bod would be plugged into the network.  People were not happy, least of all me, but that was the fall out.

However, I then had to do a tertiary clean up and after the Easter break I spoke to three of my main customers, trusted operators, the actual folk who should have been using the machines at the remote sites; not temporary staff; I asked them why they had not noticed.  The replies...  "Because it had worked for so long without an issue", "like you make it work, so we just guess it always is" and "we didn't notice it was offline".

They were very much putting everything into my court, assumption on the part of all parties was to blame.

The lessons learned for me are to now keep checking, keep monitoring, use my automation to report status, to fix faults and if human errors creep in, to let me know.

I'm now off to spec up a service I can run on one of my own servers, just to ping the network machine which went AWOL and receive a report from it to let me know what its up to, this might be a bit of python or just bash on a cron task, but it's going to be something rather than nothing.

I will NOT assume again.

Tuesday, 18 April 2017

Software Development : Failed to get Agile

I've just been party to a conversation about a project elsewhere in my work place, my team is not involved, I was observing passively (alright, alright, I was ear-wigging).

The conversation was quite heated, one member of staff was adamant things were fine, whilst another was adamant they were inadequate.  The two of them were at complete logger heads. The driver of the conversation ran like this:

"We're not really designing software, we're asking everyone's opinion, writing it all down and only picking the things we really need to do"

As an agile developer this is essentially how I run my team, we write every possible item down, everything and I weight them, schedule them and during out sprint hand-overs we reorg whom is going to tackle diffing parts of the system to share the experience and share different things.

This chap however, was incredulous... He expressed "WRITING EVERYTHING DOWN" as a bad thing... He only wanted to do the things he felt fit, he wanted to sit down and look at the specification, produce an analysis and ONLY do what he suggested.

This would have made perfect sense to be towards the end of my academic study of software development; before the reality struck home in the work place, and I was flabbergasted to hear this chap simply working twenty something years in the past.

I mean, he's old... This company is old... But, not that old surely?

I literally caught myself tipping my head to one side as if trying to pour those words, and the way they were said, back out of my brain.

He didn't stop there though, he sat and without knowing it essentially dismissed as absurd the complete concept of Agile development; at least Agile as I use it...

"You'd be constantly juggling which task to do next, swapping people on and off tasks.... What would you do?  Meet daily, what would be the point?"

I'm not sure whether this was genuine inflexibility or purposefully derailing the effort to adopt agile beyond the scope of my own team, whichever it was, it sounded and felt extremely awkward.

It makes me wonder quite if anyone outside my team actually uses Agile processes around here...

Sunday, 9 April 2017

Server Admin : How Good is your Backup?

How robust is your back up solution?  Go on, be honest with yourself, how good is it?... Because I've seen a whole host of them and, at this very moment, this is the screen up on one of my servers....


Yes, my raid 5, just a test raid 5 with three really bad recycled SAS drives in it has failed; this doesn't surprise me, but it does delay me because I now have to rebuild the data... However, I know my data is good.... Lets see how good my back up is.

This back up is coming from a DD created raw image of the virtual disk, stored to and soon lifted from my NFS accessible ZFS mirrored back up server.

Therefore you would be right to ask, why are you rebuilding the virtual RAID disk in the above screen shot?  Well, I'm going to test my back up strategy!

I popped the known bad disk and the good disks out, replaced all three and I'm able to test a restore to a new virtual disk set, I have a USB boot drive ready, this is a test.

This kind of test, a real live restore, is sorely missing from so many enterprise set ups, so ask yourself is your back up going to work?

Wednesday, 5 April 2017

Development : Anti-Hungarian Notation

Whilst cutting code I employ a coding style, which I enforce, whereby I output the scope of the variable being used with a prefix.

"l_" for Local
"m_" for Member
"c_" for constant
"e_" for enum

And so forth, for static, parameter and a couple of others.  I also allow compounds of these, so a static constant would be:

"sc_"

This is useful in many languages, and imperative in those which are not type strict, such as Python.

Some confuse this with "Hungarian Notation", it's not.  Hungarian notation is the practice of prefixing a type notification to the variable name, for example "an integer called count" might be "iCount".

I have several problems with anyone using Hungarian Notation, and argue against it thus. With modern code completion and IDE lookup tools this is really not needed, with useful and meaningful naming of your variables the type is not needed and finally there are multiple types with the same possible meaning... i.e. "bool", "BYTE" and "std::bitset" are they all 'b'?  What about signing notation, so you compound "unsigned long" as "ul" to the name?

It all gets rather messy, a good name is enough.

However, the scope of the variable might change, the scope might not be enforced, and in none strict languages you might have a variable go out of scope and then automatically re-create the value with a blank value, if you don't follow your scopes.

Therefore I can justify my usage and enforcement of this coding standard.

What I can't stand however is when someone listens to my explaining this, they read my coding standards document, they even go as far as having me reject their code during peer review for these reasons, and then they dismiss my comment with the "it's just Hungarian Notation"... Scope is not type, and type does not define scope, don't be fooled!

Friday, 31 March 2017

Linux Server Admin : Bash Kill Processes By Common Name

On my Linux server I've recently wanted to go through and kill a bunch of application instances in one go, this is a server where students have been connecting and running carious programs under python, therefore I want to remove from my processes anything called "python".

We can see these in our bash shell with the command:

sudo ps -aux | grep python

To remove all these programs I create the following bash shell script:

k = 0
for i in $(ps -aux | grep python)
do
  k=`expr $k + 1`
  kill -9 $i  
done
logger -s "Closed $k Python Instances"

Notice k=`exp... this is NOT a single quote (apostrophe) it is the "smart quote" on a UK English keyboard this is the key to the left of the number 1.  It is used to substitute the command into place, so the value counted in K becomes the result of the expression "$k + 1", i.e. K+1.  More about Command Substitution in Bash here.

The call to logger -s places the message both on screen and in syslog for me to review later.

This simply loops through all the applications resident and kills them off, I've saved this as a "sh" file, added executable rights with "sudo chmod +x ./killpythons.sh" and I created this to run as a cron job everyday at 3am (a pretty safe time, unless I have some students burning the candle at both ends).

That's everything about the bash script, for those of you wondering about the students, they're those folks following my learning examples from my book, which you can buy here.