System: Collection of Technologies/Architecture communicating with each other that serve a particular set of users.
Design: Process of understanding the requirement, constraints, concerns, tradeoffs and choosing the technologies how they work together to serve the purpose.
Process Of Designing
Try to figure out/break down the system into functional and non-functional requirements.
Functional Requirements could be initiate payment, place an order, add to the card.
Non-functional Requirements could be the latency of request, easily deployable, fail-safe, secure, etc
You can ask questions about the functional requirements as well as non-functional requirements and do a capacity estimation about how much resources are needed to serve a particular number of users.
Capacity Estimation:
Components of System: They can be, Data, Database, Network, and their protocols, APIs, Cache, Gateway, Load Balancer, Reverse Proxy, Queues, Cloud Providers, and their communication.
Client-Server Architecture (2 tier Architecture): Client and Server communicate over the network and mainly client initiates the communication and the server serves the client (it could be data, business logic, or processing)
In this architecture, there could be multiple kinds of clients,
thin-client: the client has no logic everything is served by the server, (Netflix, Youtube)
thick client: the client has a lot of logic built into it. (Outlook, Video Editing on Web)
N-Tier Architecture: When an application becomes complex, there could be a separation of layers like Business Layer, API Layer, Database Layer, and client. Multiple layers could be added to suit the system needs like reverse proxy/load balancers or caching the fast performance.
Proxies: On behalf of. Component doing something on behalf of another component is a proxy.
Forward Proxy: On the client side to talk to the server. (for anonymity, caching, firewall, security, organization level authentication, etc)
Reverse Proxy; Sits on the server-side and acts as the middle man between client and server hence decoupling client and server. used for loading balancing, caching, DDoS, SSL encryption, etc.
It can be a single point of failure.
Data and Data flow; Data is the core of most of the systems. so data, data type, the format of data, transport mechanism drives most part of the system design process.
Multiple layers may need different formats of the same data, JSON to DB entities to domain entities or files. Same data could be stored in different stores like queues, cache, and DB and could flow differently via APIs, events, or messages.
A huge part of system design is understanding how data is generated in the first place (users, insights, etc) and flow-through system to be stored in a particular format in some stores.
There are multiple factors that drive these decisions including volume of data, Type of data, consumption and retrieval frequencies, and also security.
Databases:
Relational DB:
Choose relational databases if your app requires data consistency, can be represented in tables, requires ACID properties, and has a fixed schema, easy to scale vertically.
Schema: how your data is structured (table, rows, and relations)
ACID;
Atomicity: Either the whole transaction happens or nothing at all there is no partiality.
Consistency: State of the data will be consistent always.
Isolation: Multiple transactions do not know about each other.
Durability: Ensures persistence and logs of data for recovery etc.
Non-relational database (NoSQL): When you don't have a fixed schema, Can be scaled horizontally.
NOSQL
Advantages
NoSql is cheap, stores data in JSON format, no joins (this is good when you most of times need to join)
NoSQL schema could be flexible there would not be nullable data and adding new data would also be easier.
But you need to put data into single insertion, updating data is like updating everything again.
In-built horizontal partitioning, availability is prioritized over consistency.
Good for aggregations (total salary, avg age etc)
Disadvantages
If you have lots of updates (it's expensive) because it supports only inserts or deletes
ACID is not guaranteed
No Transaction
NoSql is not read optimized, we have to read all data to get one column however in SQL we can get one column only.
No constraints, so you can't force joins even though You join two tables it's hard. so data is also not very much consistent.
Types of NoSql:
KeyValue Store: When needed data is for the single key i.e Discount vouchers, User Cache Data (Redis, DynamoDB, MemCache are key-value stores.)
Very fast retrieval.
Document-based Databases: No fixed schema (dynamic Data), support volumes of reads and writes. It has collections and documents instead of tables and rows. Can't join so data will be redundant. No ACID properties are provided and every document could have different data (null or undefined properties) application has handled all the scenarios.
Column DBs: Kind of fixed data but without ACID (used to store analytics data, IoT, health checks, etc). Usually, support write-intensive operations. (Cassandra, HBase, Sila) They also support distributed databases.
Search Databases: Used for searching text. for the very large number of data. heavily indexed databases. Used for read-intensive systems (Elastic, Solr, etc)
Application And Services: functionality the system provides.
making payments, order, performing transactions, serving data.
Client App: rendering data, collecting data, communicating to the backend, handling events.
Backend App: Data modeling, sending, receiving data, data transformation, business logic, DB connections.
Application has requirements, an interface to collect the data, layers, code structure, cost/performance metrics. deployability, monitoring, and logging. should be resilient, reliable.
Applications could be monolith or service-oriented architecture (microservices)
API (Application programming interface)
Provides interaction from application-application programmatically, communication is the important part and also abstraction.
A good API has defined contract, documentation, data format (REST, SOAP, RPC), and security.
Rate limiting, throttling, etc
Cache:
Computed responses from very frequently retrieved responses or heavy computation are saved into fast storage in-memory or some layer to serve quickly.
if Cache servers a response successfully it's called cache hits and if it doesn't have the response it's called a cache miss.
Cache Invalidation: Accomocating/Updating the change in the data.
It could be done via TTL (time to live and it will be deleted automatically after that time)
Cache Eviction: Cache supports a limited number of keys to be stored, so we have to evict less used keys. using multiple strategies It could be first in first out, Least recently used, least frequently used.
Cache Patterns
Read Through: When you are reading from the cache and cache is reading from the DB. has to b provided from a third-party library or component and DB and cache models have to be the same.
Write Through: When you are writing to the cache, and the cache is writing to the DB
both above pattern if the cache is the single point of failure in the system
Cache Aside: When you are reading from the cache when it misses you read from DB and update the cache. Works great for read-heavy. cache invalidation has to be considered.
Write Back: When you write to the database and then write it to cache. Its more consistent
Bulk Write: When you are writing to cache and cache is writing to DB in bulk to save network cost but when cache fails data is lost. but it's great for write-heavy systems.
Caching can happen on a browser level, API gateway, reverse proxy, a separate layer in the system, and in-memory cache in the application.
Rest API: A Design Approach
HTTP communication between software, it has verbs to communicate what to do, endpoints to identify where and headers to communicate more information
Verbs
GET, POST, PUT, PATCH, DELETE, COPY, HEAD, OPTIONS, LINK, UNLINK, PURGE, LOCK, UNLOCK, PROFIND VIEW
Best Practises
0. Stateless
1. JSON Data format
2. Use nouns (not GetUser but Users and use Post endpoint, not delete book instead books with verb delete)
3. User plurals (Users, Books)
4. Use HTTP Error Codes (500, 200, 404, 201)
5. Paging, Filtering, Sorting of getting Collection
6. versioning localhost/v1/orders
7. Docs (Swager)
8. Using SSL, TLS (HTTPS)
Throating
Rate limiting: Number of requests per user in a given time
IP Level Throttling: No. of request per IP
Concurrent Level: no. of concurrent request per client
Resource Level: DB/Particular resource throttling
Rete Limiting
Api Rate limiting is important when you have millions are request to serve.
Simple strategy you can implement is number of request per user per second or number of concurrent requests per user per second. or Dropping or defferring of less important requests POST, GET, Reporting or analytics.
Usually Redis Token Count is useful as stripe does for each request. Once they are exhausted drop the other requests.
Authentication/Authorization
Session-Based Authentication is usually implemented as a cookie in the browser. the client sends a key for the particular session which server has generated when the client provides its credentials. Invalidate session when logs out
State-based session: server has to remember the sessions that mean Redis cache, or some memory to remember this.
Basic Authentication: The client needs to send a username and password every time. It's stateless
In the Authorization Headers: username:password, and encode base64
You send Basic <encoded value of username:password>
The reason we are encoding is that if there is non-HTTP compitable it should be encoded.
Digest Access Authentication: Encrypted Token
Asymmetric Cryptography Authentication: Encrypted Token can only be decrypted by the server
OAuth
Standard to grants, authenticate multiple clients, etc Imlimenr single sign-on, multiple websites to login
JWT (JSON web tokens)
Signed token with all the clients' information; when the customer returns we verify sign.
Queues
Asynchronous communication, message/event store that takes in messages and is picked by the consumer as per their frequency.
It could be an in-memory or a separate layer.
It's easy to horizontally scale to consumers and producers and also it can store data when servers are down and also can help handle a lot of load and sudden spikes and requests.
Producer/Consumer: Producer pushes in and then consumer consumes and the message is removed from the queue. In its one-to-one model, each message is consumed only once.
Ordering: When the queue is ordered that means it has to process all the processing messages before processing the new one and if some message fails the all the following messages won't get processed.
In an unordered queue, when the consumption fails it sends the message to the dead-letter queue to be processed later
In chat application order matters but in reporting it might not.
Publisher-Subscriber: When a message is broadcast for one or more than one subscriber to act upon it.
There could be multiple subscribers for a particular message or a single one. The producer generates the messages and the message broker in between can add data, divide the message and send it to exchange for consumers, Where consumers may consume their own subscriber messages only
Ordering is not guaranteed here but a priority queue can be developed to handle priority messages.
This pattern can be used when asynchronous processing, decoupling, load balancing, deferred processing, and data streaming.
Don't use it when Small data requirement, synchronous communication and it does not support achnolowdgement.
Scaling
To scale queues, there could be round-robin of multiple queues for sending and receiving messages and there could also be primary and secondary queue for that too.
Kafka vs Rabbit MQ/ActiveMQ
Kafka is small size events, much faster and streams events however rabbit-mq is heavy takes in more loaded object and works like messages rather than streams.
Kafta is better used for logging, heartbeat, smaller object transfers; however rabbit-mq is used for transactional data, carrying large objects.
Kafka supports only publisher/subscriber support only; rabbit-mq do one-to-one, one-to-many and also topics.
Performance Metrics
Throughput: Amount of work done in a particular time (20 Request per second)
Bandwidth: Bandwidth to transfer data from one end to another end. Network capacity etc.
Response Time: Time is taken per request 1 sec per request.
Measuring Performace:
Database: time taken on DB, indexing, not multiple joins
Cache: Latency to write
Message Queues: Speed of pushing and pulling the messages in the queue
Workers: Performance, time taken, memory usage
Instance Performance: RAM and CPU usage
Tools to calculate Performance: NewRelic, DataDog, Vivid Cotex, Azure Monitoring
Fault & Failures:
Transient Fault
permanent Fault
Network Fault (Timeout)
Hardware Fault:
Can be solved with replication, load balancer, multi-region deployment
Database:
Replication, failover mechanism
Snapshot: Db Transaction log or snapshot at a particular time is taken if DB goes down at a particular time; It can be restored from the same time it went down.
Database Replication
Replication means to have a copy/replicate
Primary -> Secondary, Master -> Slave
When there is something wrong happens with DB, the on failover, Slave will become master.
Usually, primary DB is used for writes and updates, and secondary is used for reads hence distributed load.
Replication also increases performance: Because it can be geographically near to the client/app. It saves network time.
Replication Lag: The time taken between replication, usually becomes higher so client reading from replicas gets inconsistent data.
So there are strategies to make data consistent;
Read After writing (Synchronous Update): master always completes the write when it's written to slaves too. It takes time but the system is always consistent.
Asynchronous Replication: master will send the write but it won't wait for them to complete write too. It is much faster but It's inconsistent.
Hybrid Semi Synchronous Update: Primary will wait for only one replica to acknowledge only or N replicas where N is usually less than total replicas (N is called Corum)
So in high consistency applications, i.e banking Synchronous replication is used in other scenarios Asynchronous will suffice.
How Replication Work: It can be partial replication means only a particular table is replicated, a Streamed one, which means all the writes are replicated at that time, or in bulk means after some threshold its dumps data to the replicas.
Memory Usage
CPU Usage: Multiple CPUs
Bugs
Canary Deployment, Unit Testing, QA, Regression, Deployment Multi-Region
UI Should handle gracefully
Scaling:
Vertical: when you increase the capacity of existing resources (RAM, CPU, etc)
Horizontal: When you increase the resources themselves (CPUs, Nodes)
CAP Theorem:
C: Consistency; System is always consistent all the time whatever the condition
A: Availability: The system is always available all the time
P: Partition Tolerance: The system still works even when distributed partitions are not working
The theorem states, a system can't have 100% availability and 100% consistency when distributed.
That means we have to choose between Degrees of availability and consistency and choose a trade-off while designing a system.
For example: When banks are not connected a person can withdraw the amount from one branch and do the same from another branch when branches are not connected. He might withdraw more than the amount he has. So, either branch has to connect or he should be able to withdraw only from his account branch.
but for deposits, he can deposit in multiple branches. That way ensures availability over consistency while depositing (updating) but and consistency over availability while withdrawing.
So, the overall system is somewhat available and consistant.
Database Partitioning
Database is partitioned when database can't be scaled more for physical iunstance.
There is horizontal partitioning (row partition of tables) and vertical partitioned (column partition of tables)
Horizantal partitioning is called sharding.
Database Sharding
horizontally partitioning data.
Logicall Sharding: When its sharding according to some logic but it could be on same machine to on different too.
Physical Sharding: When data is physically on other machine.
Sharding is advantageous on other places too other than data load. It can distribute the query load too and since its distributed if one partition goes down you can also serve users from other partitions too whose data is available in other partitions.
Sharding Strategies
Dynamic Sharding:
When client asks some sharding module or other service for data location.
Directory based Sharding: When you have lookup table to decide to where data should be located in your shards. Usually it's pre-anticipated so data partition is distributed evenly.
Disadvantages of this is; look up table shouldn't be large, always available because without it you can;t know where to read or write data so it becomes single point of failure.
Algorithmic Sharding:
When a function decides within a clinet where a data has to go in all the partitions.
Key based Sharding: When you use hash function on some key and that decides which partition data has to go or read from. Its fairly consistent and data is evenly partitioned but problem arises when a data partition is added or removed now you have to recalculte the hash and move data around to make it consistant.
Sometimes even without hashing function only key can be used to sharding too. Country/City in some cases. But it should be immutable in nature otherwise you'll have to migrate the data to different shards.
Shard Key could be combination of multiple keys too.
Range based Sharding: When you partition data based on range, may be on months, username start char or even user id or date. Usually this works best when range is used in query often. Like records based on dates if its sharded on month will be efficient because it knows where to search.
Downside could be uneven distribution. and some node may have much more data than others. Those nodes are called hotspots.
Drawbacks For Sharding
When data is not evenly distributed, some are overloaded and some are less.
Combining data is hard; reversing shard is complex.
When a data query is searching through multiple shards; it's is much slower because of overhead of combining the data.
All databases don't provide sharding capabilities.
Hashing
Computation on data to convert it to the number; same data should return same hash every time. So, when we are dividing the load/data between the servers we use a hash of data to decide where the request or data should go. But the problem arises when we increase and decrease the servers, we have to re-distribute the data too.
Consistent Hashing: Hash is also computed for the range as the hash is taken from the data.
So if S1 could have a hash of 1, S2 could have a hash of 64, etc and when data is being distributed among servers its distributes as (Greater than Hash of S1 and Less than Hash of S2 will go to S2) and between S4 to S1 goes to S1. So, If one of the servers goes missing the range can easily be redistributed for the missing server only, that saves a lot of redistribution and that is the advantage of consistent hashing.
Event-Driven Architecture (Request/Reply)
When you communicate asynchronously with events primarily queues and wait to queue to response. to make queue message idempotent you can add request Id and verify same id when receiving the messages.
Distributed Logging
Its important to be able to log your services and create a context out of it and follow through for one order id or transaction id for what actually was life of some order happened?
Its really neccossory to have one context id for each request for all the services. Then all request could be attached to one hiererachy.
Its also good to use one wrapper for all microservices logging, so in future if we may change the logging system for each service we'll just change on the one place.
Client-Server Communication
HTTP Request: The client can send the request and receives a response. The client sends HTTP request TCP/Handshake
HTTP Polling: The client keeps sending multiple HTTP requests to get updated responses/
HTTP Long Polling: The client sends a request with a long timeout and the server sends a reply when it has a message for the client.
Web Sockets: client makes a handshake/first request to the server and then the server can send a request to the client and vice-versa until the handshake breaks.
its expensive it hijacks the entire HTTP connections
Server sent events: Push Notification: The client initially makes a long-term connection/ sends content-type: event-stream to initiate the connection with the server and then the server keeps sending the response multiple times but the client can send it back. for the client to send back It needs to make an HTTP request. It's not a JSON response it's bite-sized lightweight chunks of data.
It's used in a live feed, push notification, etc
The browser has an in-built EventSource() object that keeps getting requests from the server.
Don't use HTTP 1.1 because it generates a connection