Cloud Design Principles

When creating a thing – doesn’t matter what it is – you can two choices on how: 1) make it up as you go or 2) have a design or plan to work through. With many things that you may make option one is a good one and it’s how most of us live our lives living one day to the next. This is ok for the small stuff that doesn’t affect you over the long term or affect others in the short term. For the big stuff a plan is needed. Now, let’s clear when I say plan I don’t mean instructions (step-by-step) but an idea on how to achieve an outcome over time using resources.

If the thing you are building is for yourself and with your money then you are probably going to focus on the time it takes and the money it will cost as you will be the one that will use it and pay for it. However, if you are going to building something for someone else then you can’t just have a plan but you need a design, a blue print, instructions on how you think it will achieve the outcome that the customer wants. In order to come up with the initial design you need to know two main factors:

  1. Limitations or constraints of the system – the space in which the system has to fit
  2. Principles of design – rules or even laws that the design have to fit into

For example say you want to build a shed in your neighbours garden. You will be given a space for the shed to fit into along with planning regulations that will give a rough idea on the shape and location of the shed. The materials, layout and other stuff is all to play for and these are where the principles of design come in. The principles of design state how the shed will be built – the structure to make sure it functions for the lifespan of the shed say 10 years. Fundamental principles are based on access (how to get into and move around, how to store things, or use tools, how it will be secured from nosy critters both wild life and robbers.

For computer systems these two factors still apply. We need to know where the system has to work – the location of the system and the legal requirements (if any) or operating the system in that area. For example an online store needs to be able to collect payments. The location of the transaction is important when it comes to where you pay tax on that transactions. For the design principles these also apply to IT systems and they are specific when designing things on the cloud.

Google Core Principles of System Design

Google defines its design principles in its Cloud Architecture Framework.

  1. Document Everything – the number one reason systems fall over – in all the interest of building the solution no one stopped to write down how works or what decisions where made in the design. Documenting these days can be in many forms but it needs to be findable and readable.
  2. Simplify design and used fully managed services – keeping things simply mean that it is easier to maintain. This doesn’t mean that it has to be kept at a level where anyone can understand it but it does have to make sense to someone who either knows or is learning the system. Using managed services to keep things simple from the start.
  3. Decouple architecture – the whole point of using cloud is to have a system that can expand and contract to load otherwise known as elasticity. For things to be flexible they can’t be rigidly stuck together so design things that are loosely coupled or decoupled (it’s the same thing).
  4. Use stateless architecture – this is a confusing one as the idea of stateless can be a little confusing but it vital when working in cloud computing. Stateful designs have a single relationship or memory between the client (you) and the server (computer) that remembers what information has already been given in the same way a customer gives a waiter all the information about an order in a restaurant. For small operations or ones where the load can be calculated stateful works well as the client is always connected to the right server (waiter) who can be asked as to the status of a request or have items added or removed. In cloud computing where things are designed to automatically scale up and down this is not possible as the server may appear and disappear depending on load like the waiter leaving when the lunch rush is over but you haven’t been served. Stateless means that a shared service is used to provide all the servers with a clients data giving the ability to be flexibility

Amazon Web Services (AWS) 6 Pillars

There are 6 pillars of a well designed AWS system (these can be used in any system and good for systems engineers

  1. Operational Excellence
    Monitor the system – know what is going on (observability: logs and metrics), set alerts when things go out of bounds, be able to interrogate logs when things go wrong and can be corrected (Systems).
    Improve the system – what new features will do. Again metrics and logs to see how things are running (Processes).
  2. Security
    Protect both information (encryption) and Systems (walls, doors, and keys)
  3. Reliability
    Ensure the system – know the system is designed to work at certain loads (no guess work) (Workloads)
    Recover the system – know how to bring the system down and back online in a controlled way (Failures).
  4. Performance Efficiency
    Select the right resources – right types, right sizes, right performance to make the system efficiency (cost vs price)
  5. Cost Optimisation
    Understand the cost – what are the trends that are costing money
    Control the cost – only spend what needs to be spent
  6. Sustainability
    Minimise the impact to the environment – turn things off, don’t start things not needed.

Microsoft Azure

Microsoft also has its design principles for its cloud platform Azure. There are 10 design principles from Azure.

  1. Design for self healing – when working on the cloud systems there is always that nagging feeling that you can’t touch the system you are working on – you have to work at a distance. Therefore to make you feel better make sure the system can stand itself back up if it falls over. The system shouldn’t fall over but fail gracefully from which it can be fixed. How this happens depends on the design but it shouldn’t involve a manual process.
  2. Make all things redundant – it’s good that the system doesn’t fail at all but if it does it will fail at one point first. To avoid this have redundancy to reduce the impact of failure. This can seem expensive as you are paying for a resource that is not being used but just think of the working part costing twice as much and charge accordingly. Not all redundancy has to be an exact copy but big enough to take the stress whilst the main one recovers.
  3. Minimise coordination – this is basically the same as keep components loosely coupled (communicate via messages). With things loosely connected scalability is possible.
  4. Design to scale out – as number 3 design things to scale through loosely coupled components.
  5. Partition around limits – partitioning a system is a way to protect it from failure so linked to point 2. All systems have limits be that the whole system, a section, or even a single component. Cloud systems have limitations already set so design around these via partitions especially on big components like databases
  6. Design for operations – it’s important to meet customer needs but also it needs to be maintainable by the operations team. No point having a system that is impossible to maintain and upgrade. Make sure uses standards and tools that includes easy to use tools.
  7. Use managed services – only look after things you want to look after (are interested in). Use defined platforms (platforms as a service – PaaS) rather than Infrastructure (Infrastructure as a platform – IaaS)
  8. Use an identity service – this one is related to security where you let a 3rd party provider look after your identity management. Identity management is part of Identity Access Management where Identity is the front of door credentials that a user uses to gain access to a service. There is then a further layer what the identity can do – what the identity can access.
  9. Design for evolution – new things come in, old things go out. Best not to design so you can’t make of those sexy new things and be stuck with the old. This doesn’t mean
  10. Build for the needs of business – last but the most important design things to meet the thing that pays for the system in the first place. Business doesn’t mean exclusively customers but the people that work for and interested in the the business that includes staff and other stakeholders

Summary

Here is the summary of the three main cloud providers and their design principles

GoogleAWSAzure
Document EverythingOperational ExcellenceDesign for self-healing
Simplify design and used fully managed servicesSecurityMake all things redundant
Decouple architectureReliabilityMinimise coordination
Use stateless architecturePerformance EfficiencyDesign to scale out
Cost OptimisationPartition around limits
SustainabilityDesign for operations
Use managed services
Use an identity service
Design for evolution
Build for the needs of business