Computer Programming: Robust Programming

Knowing a little about computational thinking and algorithms along with the fundamentals of programming will allow us to create some pretty smart programs that people will love to use and make you a squillionarie. Happy days. But … the world is full of thieves, brigands, and just people out there that like breaking stuff. In order for our program to work effectively we need to think how to protect it the same way anyone would protect their business.

Defensive Design

Designing for Defence

The first thing we need to do in order to stop mischief on our program is think how it could be attacked. Once we have an idea of how things could go wrong we then need to see how those areas can be protected. It’s important that there are circumstance where an attack on your software is not and vice versa. A common example is a website store looking like it’s being attacked due to a lot of people trying to use it at once but this could be due to demand on the site that was expected. This is rare. What is more common is for sites that are under known pressure from genuine interest to also be attacked under the cover of the increased traffic. When looking at designing for defence think about how the customer or customers will use the program and wider system.

Anticipating Misuse

The first thing to do to thinking defensively is to accept the program is never really finished as the system that it runs on will be get updated, you’ll be told the program doesn’t work under certain conditions, and there are helpful updates you can make. Accepting that the program may need further work is really helpful to you as the developer or as someone else taking on the program as it means the program will be helpfully designed for change.

There are three things to consider when thinking about defence:

Error catching and event logging. When writing programs it gets very exciting when things work. However this isn’t always the case so it really helps to fail with grace – informing both the user that the application hasn’t work and developers why it hasn’t worked. You may need different messages as you may not want to confuse the user and also not tell them what has failed – baddies will use the failed results to attack again e.g. “password needs to be 5 characters”. You will also need to record how the application works through application logging which can be set to record when and how the program was used. This can help the developer see if there are errors what occurred before them. You can see application event logs in system event log application like Event Viewer on Windows or Console on a MacOS.
Proper commenting and documentation of the program. Sounds boring but correctly commenting what the code does will save countless hours of trying to work out what the code is suppose to do. Writing good documentation is actually a skill and it’s best done with a template otherwise can be to arduous.
Good use of version control so you know has changed. Version control is a way of managing changes of a product. Sometimes you release a change to a program which can only be fixed by going to back to original program. Good version control is essential to be a good programmer.

Authentication

Input Validation

Input validation is critical in stopping your program doing things you don’t want as mischievous people and criminals will look to exploit what the program was designed to do. On the friendly side it’s good to take in the correctly formatted input. For example “Enter amount to borrow” is expecting a number rather than the word ‘bananas’. On the unfriendly side you may get something more nefarious by using injection attacks where a hacker will try and exploit the inner workings of your program by injecting extra code to muck stuff up.

In order to avoid these problems there are two forms or defense we can use.

Check what is coming in is correctly formatted – correct data type and size. e.g contain text. One of the easiest is to simply ask for the data to be entered twice like you may see when registering an account with a password.
Check the input before starting the any processing – you can accept an input after it’s been validated and then have a closer look in the program before starting to process by chopping it up and checking the input in more detail and then responding appropriately if you see something dodgy. For example if you see something like =1! then calmly log that perhaps shut out that attempt without the user knowing.

The key to validation is that it is worth taking a little extra time at the start than having to pick out the see in the middle as you don’t want it getting to the end.

Maintainability

A computer program is quite a fragile thing of imagination. As such programs need to be looked at during its life to see if it’s working ok. Programs can work well at low use but struggle at high use. It may work well on the current operating system but not at all on the next one (or more likely the next, next, next one). The key is that thing change and you have to think how will the program work over time.

Refactoring and rewriting/platforming are two things that you may have to do in order to maintain the program.

Refactoring is like improving the program working by tweaking things after it has been running for a while and is doing ok but you left a few designs shortcuts in place or you thought hardcoding (fixing) a value was needed when it would need to change (you can have four ways of controlling the value of variables: 1) fixed in code e.g. tax = 12.5, 2) user defined e.g. “state your tax rate: =tax”, 3) set in a configuration file, 4) defined by environment variable e.g. time = current_time). Either way refactoring tend to small changes that you need to do to maintain the code as it runs for the user (in production).

Rewriting/platforming is a more substantial as it involves rewriting the program from one programming language to another. This may seem like an unusual thing to do but the reason for the change if one of efficiency as the original programming language does not cope as well as other languages under large loads or different environments. Replatforming can also be to split the program into more than one language to, again, help with the efficiency of the program.

Use of sub programs

Another way of improving how a program runs and is maintained is to divide the program into smaller, dedicated functions or sub programs. Dividing a program into defined functions is good for many reasons for maintainability as you can make changes to one sub-program and then reload rather than having to update the whole program.

From a defensive point of view it’s handy to divide access to processes into areas that themselves can be protected. For example you could have a single program that takes in a username and password then the users salary, date of birth, and address. It then calculates the tax for that user. In a single program if the main program can be messed with then the baddies have access to a lot of data. By separating into sub programs there is less risk that personal information gets out there.

More on sub-programs in program design and architecture.

Name conventions

To help on the maintainability of programs then having a naming convention that is consistent, easy to read, and not over burdensome. For example if we collected the first and last name of a user then we could call the variable:

first_name
f_name
f_n

The first one is very clear on what it is. However it does take a while to type and takes up space (the ease of reading code is base on reading it line by line; if it breaks over multiple lines it is harder to follow despite the massive screens we have these days). Using this convention also means that the last name will be last_name or the full description separated by an underscore.

The second one f_name is less descriptive but shorter so less space and faster to type. Sets a standard for last name to be l_name which again, is less descriptive but takes up less space. Another advantage of this format is the second word indicates the class or type of variable – something_name indicates that the something needs to fit with name i.e. first, last, maiden, chosen etc. Basically you can guess what the variable could be and you’d be probably right.

The third one is super efficient as it takes up very little space. But it’s tricky to know what it could be from just looking at it. Using this convention last name would be l_n which is very tricky to work this back to last name.

All three are valid with pros and cons and it’s up to you how you want to define variables but having a standard will help you to create variables as you won’t have to think about naming things. And when, and you will, go back to the source code, you will be able to follow how the program is running. When debugging and running traces the computer will show the values of variables as the program runs. If these are easy to make out then debugging is easier.

Indentation

When we read things we pay attention to not only the words but also the structure of what we are reading as a short cut to understanding what is written. For example

Bob went to the shops to buy some milk when suddenly he heard
“Stop!”
Bob froze just before a speeding car came flying past him.
“Thank you” said Bob to the kind stranger.

As you can see we have indented the speaking parts to help the reader know what is narrative and what is speech. We do the same with computer languages to help the reader. Some languages use the indents as forms of structure notable python.

f_name = input()
l_name = input()

If f_name == ‘Bob’:
print(“Welcome Bob”)
else:
print(“Welcome stranger”)

The indent shows the options for the If statement. It’s the same for loops to help show where it starts, what’s in the loop, and where it stops.

Commenting

The final part of defensive and general good progamming is commenting the code. This means writing stuff in the code that is not part of the program but helps someone reading the program what it does. At the basic level the program will have a title and a description of the code. Better still is marked out structure of where things are like variables, if statements and loops. Commenting out code depends on the programming language. For python an #, hash, US pound sign before a sentence tells the computer to ignore everything on this line. You don’t have to go crazy with the comments as the code should be structured to be readable by itself and the most likely person to read the comments is probably you if you wrote the code plus anyone that will review the code so best to keep things clear and tidy.

Testing

Imagine you’ve worked it all out and you’ve finally got your program working. Hurray. “Awesome. Good job. I look forward to it passing the testing”. “Huh?”

The most important part of writing a computer program is that it works for the people that it is written for. This is the same for any machine or process designed to solve a problem. To do this we need to know and define what the problem is and what is an acceptable and unacceptable solution. It’s important to reflect a bit on the acceptable AND unacceptable as a solution can be acceptable in the short term but not in the long term or vice versa.

Types of Testing

There are two main types of testing: iterative and final/terminal. As the names suggest the former you do as you go (iteratively) and the latter is you do at the end (final/terminal).

Iterative

Final/Terminal

Errors! How they help

Syntax Errors

A syntax error in a computing language is the same as would have with a spoken language – stuff in the wrong order or stuff missing or the wrong stuff added.

For example

pint(“I like milk”)

will generate a syntax error as there is no function called pint (unless you made one which we advise against). Typos, missing :, wrong indentation will all create syntax errors and will be picked up by the programming language when you try and run it, if not before if you are using an integrated development environment (IDE). This is good news as the next error is worse.

Logic Errors

Once your program is free of pesky syntax errors it will run (compile or interpreted). That is great news but likely the start of your problems as all errors from here are ones where the code is not working as you designed and wrote it. It’s the equivalent of building a machine which runs out of control, or won’t start, or gives you matchsticks when you wanted planks of wood.

Logic errors are annoying but there is a way to try and solve it and that is to run the program in an IDE in debug mode which allows you to see the values of variables as the program runs. The alternative way to check for syntax errors is to work out what you program is doing manually through the use of a trace table. Trace tables (which we cover in the algorithm page) take the starting values and follows them through as they are processed by the program. Trace tables are handy with simple programs or at a high level but are a bit of struggle when there is a lot going on.

Other errors

Syntax and logical errors occur in programs but there are other errors that you may have to consider outside the use of the program itself. There are quite a few but luckily they sit outside the remit of the developer.

Network errors

Problems between computers in a network. There are several problems that can happen from incorrect data formats and lost packets but in general the main problem is not connecting to a resources that is no longer available or doesn’t do what you think it supposed to (see API).

Testing Data

Normal

Boundary

Invalid/Erroneous

Refining Algorithms

Previous: Programming Fundamentals < | Robust Programming | > Next: Boolean Logic