Wednesday, September 8, 2021

Software System Archetecture

The architecture of a system is the set of fundamental concepts
or properties of the system in its environment, embodied in its elements, relationships, and the principles of its design and evolution

Fundamental Of Architecture

Role of software Architect: Designing system considering all the stack holders of the software with their functional and non-functional requirements and balancing cross-cutting concerns

View Points: Nobody cares about software architecture as a whole, viewpoints are the way to communicate particular views of the system to their relevant stakeholders to give them a grasp of the system.

Perspective: It's complementary to a viewpoint that shows cross-cutting concerns like performance, security, etc.

Static Structures vs Dynamic Structures

Static structure is what statically available in the system, it could be database schema, classes, data structures etc and dynamic structure is what system has on run time that could be instances of particular classes, request instance etc

Externally visible behavior vs Quality properties

Externally visible behaviour is what we see system as black box, it takes an input and given an output. how is not a concern and quality properties is performance, requests time, throughput etc.

Element

Element could be component, module, library which has defined interfaces, responsibilities and boundaries. 

Stackholder

Any class, person, company who has stacks, paying or using for software are stakeholders. they could be developers, testers, managers, payer, user, maintainer etc.

View Points

There are multiple stakeholders, each needs to view system with their own perspective. Making a monolith diagram or description of architecture in that nobody will be able to comprehend and be useless so we need to divide architecture as what stackholders will need to view it or will be interested in. for that we need to construct view points. There is no standard view points that fits all but some conventions we can follow context, information, functional, concurancy, development, deployment and operation diagram. 

Perspective

If you want to have a view of qualities like performance, flexibility, deployability etc. You have to adjust that perspective in your views. Suppose security will go to functional viewpoint as users authentication flow and information viewpoint as where user data will ve saved and deployment viewpoint to if we are taking from Active Directory etc.

Role of Architect

It's very hard to define the role of the architect, lots of people with lots of experience do this job with title or without it. Usually, the person who addresses all the stakeholder's concerns is functional or non-functional, may it be users, people who are paying, operations team, developers, or even dev-ops. And designed system construction and components at the start and verified it in end. 

Usually, Architect demands lots of experience with system design, who can perceive the future and take all that in design into consideration. They might not be technology experts or domain experts but rather work with those who expects to reach a solution.

There could be multiple architects for infra, database, domain and product tech, etc. 

Process Of Architecture



Friday, August 27, 2021

System Design

System: Collection of Technologies/Architecture communicating with each other that serve a particular set of users.

Design: Process of understanding the requirement, constraints, concerns, tradeoffs and choosing the technologies how they work together to serve the purpose.


Process Of Designing

Try to figure out/break down the system into functional and non-functional requirements.

Functional Requirements could be initiate payment, place an order, add to the card.

Non-functional Requirements could be the latency of request, easily deployable, fail-safe, secure, etc

You can ask questions about the functional requirements as well as non-functional requirements and do a capacity estimation about how much resources are needed to serve a particular number of users.

Capacity Estimation: 



Components of System: They can be, Data, Database, Network, and their protocols, APIs, Cache, Gateway, Load Balancer, Reverse Proxy, Queues, Cloud Providers, and their communication.

Client-Server Architecture (2 tier Architecture): Client and Server communicate over the network and mainly client initiates the communication and the server serves the client (it could be data, business logic, or processing)

In this architecture, there could be multiple kinds of clients, 

thin-client: the client has no logic everything is served by the server, (Netflix, Youtube)

thick client: the client has a lot of logic built into it. (Outlook, Video Editing on Web)

N-Tier Architecture: When an application becomes complex, there could be a separation of layers like Business Layer, API Layer, Database Layer, and client. Multiple layers could be added to suit the system needs like reverse proxy/load balancers or caching the fast performance.


Proxies: On behalf of. Component doing something on behalf of another component is a proxy.

Forward Proxy: On the client side to talk to the server. (for anonymity, caching, firewall, security, organization level authentication, etc)

Reverse Proxy; Sits on the server-side and acts as the middle man between client and server hence decoupling client and server. used for loading balancing, caching, DDoS, SSL encryption, etc.

It can be a single point of failure.


Data and Data flow; Data is the core of most of the systems. so data, data type, the format of data, transport mechanism drives most part of the system design process.

Multiple layers may need different formats of the same data, JSON to DB entities to domain entities or files. Same data could be stored in different stores like queues, cache, and DB and could flow differently via APIs, events, or messages.

A huge part of system design is understanding how data is generated in the first place (users, insights, etc) and flow-through system to be stored in a particular format in some stores.

There are multiple factors that drive these decisions including volume of data, Type of data, consumption and retrieval frequencies, and also security.

Databases: 

Relational DB: 

Choose relational databases if your app requires data consistency, can be represented in tables, requires ACID properties, and has a fixed schema, easy to scale vertically.

    Schema: how your data is structured (table, rows, and relations) 

    ACID; 

Atomicity: Either the whole transaction happens or nothing at all there is no partiality.

Consistency: State of the data will be consistent always.

Isolation: Multiple transactions do not know about each other.

Durability: Ensures persistence and logs of data for recovery etc.

Non-relational database (NoSQL): When you don't have a fixed schema, Can be scaled horizontally.

NOSQL

Advantages

NoSql is cheap, stores data in JSON format, no joins (this is good when you most of times need to join)

NoSQL schema could be flexible there would not be nullable data and adding new data would also be easier.

But you need to put data into single insertion, updating data is like updating everything again.

In-built horizontal partitioning, availability is prioritized over consistency.

Good for aggregations (total salary, avg age etc)

Disadvantages

If you have lots of updates (it's expensive) because it supports only inserts or deletes

ACID is not guaranteed

No Transaction

NoSql is not read optimized, we have to read all data to get one column however in SQL we can get one column only.

No constraints, so you can't force joins even though You join two tables it's hard. so data is also not very much consistent.

Types of NoSql: 

KeyValue Store: When needed data is for the single key i.e Discount vouchers, User Cache Data (Redis, DynamoDB, MemCache are key-value stores.)

Very fast retrieval.

Document-based Databases: No fixed schema (dynamic Data), support volumes of reads and writes. It has collections and documents instead of tables and rows. Can't join so data will be redundant. No ACID properties are provided and every document could have different data (null or undefined properties) application has handled all the scenarios.

Column DBs:  Kind of fixed data but without ACID (used to store analytics data, IoT, health checks, etc). Usually, support write-intensive operations. (Cassandra, HBase, Sila) They also support distributed databases.

Search Databases:  Used for searching text. for the very large number of data. heavily indexed databases. Used for read-intensive systems (Elastic, Solr, etc)


Application And Services: functionality the system provides. 

making payments, order,  performing transactions, serving data. 

Client App:  rendering data, collecting data, communicating to the backend, handling events.
Backend App: Data modeling, sending, receiving data, data transformation, business logic, DB connections.

Application has requirements, an interface to collect the data, layers, code structure, cost/performance metrics. deployability, monitoring, and logging. should be resilient, reliable.

Applications could be monolith or service-oriented architecture (microservices)


API (Application programming interface)

Provides interaction from application-application programmatically, communication is the important part and also abstraction.
A good API has defined contract, documentation, data format (REST, SOAP, RPC), and security.
Rate limiting, throttling, etc

Cache:

Computed responses from very frequently retrieved responses or heavy computation are saved into fast storage in-memory or some layer to serve quickly.
if Cache servers a response successfully it's called cache hits and if it doesn't have the response it's called a cache miss.

Cache Invalidation: Accomocating/Updating the change in the data.
It could be done via TTL (time to live and it will be deleted automatically after that time)
Cache Eviction: Cache supports a limited number of keys to be stored, so we have to evict less used keys. using multiple strategies It could be first in first out, Least recently used, least frequently used.

Cache Patterns

Read Through: When you are reading from the cache and cache is reading from the DB. has to b provided from a third-party library or component and DB and cache models have to be the same.
Write Through: When you are writing to the cache, and the cache is writing to the DB
both above pattern if the cache is the single point of failure in the system

Cache Aside: When you are reading from the cache when it misses you read from DB and update the cache. Works great for read-heavy. cache invalidation has to be considered.

Write Back: When you write to the database and then write it to cache. Its more consistent

Bulk Write: When you are writing to cache and cache is writing to DB in bulk to save network cost but when cache fails data is lost. but it's great for write-heavy systems.

Caching can happen on a browser level, API gateway, reverse proxy, a separate layer in the system, and in-memory cache in the application.

Rest API: A Design Approach


HTTP communication between software, it has verbs to communicate what to do, endpoints to identify where and headers to communicate more information 

Verbs

GET, POST, PUT, PATCH, DELETE, COPY, HEAD, OPTIONS, LINK, UNLINK, PURGE, LOCK, UNLOCK, PROFIND VIEW

Best Practises

0. Stateless
1. JSON Data format
2. Use nouns (not GetUser but Users and use Post endpoint, not delete book instead books with verb delete)
3. User plurals (Users, Books)
4. Use HTTP Error Codes (500, 200, 404, 201)
5. Paging, Filtering, Sorting of getting Collection
6. versioning localhost/v1/orders
7. Docs (Swager)
8. Using SSL, TLS (HTTPS) 


Throating

Rate limiting: Number of requests per user in a given time
IP Level Throttling: No. of request per IP
Concurrent Level: no. of concurrent request per client
Resource Level: DB/Particular resource throttling

Rete Limiting

Api Rate limiting is important when you have millions are request to serve.

Simple strategy you can implement is number of request per user per second or number of concurrent requests per user per second. or Dropping or defferring of less important requests POST, GET, Reporting or analytics.

Usually Redis Token Count is useful as stripe does for each request. Once they are exhausted drop the other requests.


Authentication/Authorization

Session-Based Authentication is usually implemented as a cookie in the browser. the client sends a key for the particular session which server has generated when the client provides its credentials. Invalidate session when logs out
State-based session: server has to remember the sessions that mean Redis cache, or some memory to remember this.

Basic Authentication: The client needs to send a username and password every time. It's stateless
In the Authorization Headers: username:password, and encode base64 
You send Basic <encoded value of username:password>
The reason we are encoding is that if there is non-HTTP compitable it should be encoded.

Digest Access Authentication: Encrypted Token

Asymmetric Cryptography Authentication: Encrypted Token can only be decrypted by the server

OAuth

Standard to grants, authenticate multiple clients, etc Imlimenr single sign-on, multiple websites to login

JWT (JSON web tokens)
Signed token with all the clients' information; when the customer returns we verify sign.


Queues

Asynchronous communication, message/event store that takes in messages and is picked by the consumer as per their frequency.
It could be an in-memory or a separate layer.
It's easy to horizontally scale to consumers and producers and also it can store data when servers are down and also can help handle a lot of load and sudden spikes and requests.

Producer/Consumer: Producer pushes in and then consumer consumes and the message is removed from the queue. In its one-to-one model, each message is consumed only once.

Ordering: When the queue is ordered that means it has to process all the processing messages before processing the new one and if some message fails the all the following messages won't get processed.
In an unordered queue, when the consumption fails it sends the message to the dead-letter queue to be processed later

In chat application order matters but in reporting it might not.

Publisher-Subscriber: When a message is broadcast for one or more than one subscriber to act upon it.

There could be multiple subscribers for a particular message or a single one. The producer generates the messages and the message broker in between can add data, divide the message and send it to exchange for consumers, Where consumers may consume their own subscriber messages only

Ordering is not guaranteed here but a priority queue can be developed to handle priority messages.

This pattern can be used when asynchronous processing, decoupling, load balancing, deferred processing, and data streaming.

Don't use it when Small data requirement, synchronous communication and it does not support achnolowdgement.

Scaling

To scale queues, there could be round-robin of multiple queues for sending and receiving messages and there could also be primary and secondary queue for that too.

Kafka vs Rabbit MQ/ActiveMQ

Kafka is small size events, much faster and streams events however rabbit-mq is heavy takes in more loaded object and works like messages rather than streams.

Kafta is better used for logging, heartbeat, smaller object transfers; however rabbit-mq is used for transactional data, carrying large objects.

Kafka supports only publisher/subscriber support only; rabbit-mq do one-to-one, one-to-many and also topics.


Performance Metrics

Throughput: Amount of work done in a particular time (20 Request per second)
Bandwidth: Bandwidth to transfer data from one end to another end. Network capacity etc.
Response Time: Time is taken per request 1 sec per request.

Measuring Performace: 
Database: time taken on DB, indexing, not multiple joins
Cache: Latency to write
Message Queues: Speed of pushing and pulling the messages in the queue
Workers: Performance, time taken, memory usage
Instance Performance: RAM and CPU usage

Tools to calculate Performance: NewRelic, DataDog, Vivid Cotex, Azure Monitoring  


Fault & Failures:

Transient Fault

permanent Fault


Network Fault (Timeout)
Hardware Fault
Can be solved with replication, load balancer, multi-region deployment

Database:
Replication, failover mechanism

Snapshot: Db Transaction log or snapshot at a particular time is taken if DB goes down at a particular time; It can be restored from the same time it went down.

Database Replication

Replication means to have a copy/replicate
Primary -> Secondary, Master -> Slave
When there is something wrong happens with DB, the on failover, Slave will become master.

Usually, primary DB is used for writes and updates, and secondary is used for reads hence distributed load.

Replication also increases performance: Because it can be geographically near to the client/app. It saves network time.

Replication Lag: The time taken between replication, usually becomes higher so client reading from replicas gets inconsistent data.



So there are strategies to make data consistent;

Read After writing (Synchronous Update): master always completes the write when it's written to slaves too. It takes time but the system is always consistent.

Asynchronous Replication: master will send the write but it won't wait for them to complete write too. It is much faster but It's inconsistent. 

Hybrid Semi Synchronous Update: Primary will wait for only one replica to acknowledge only or N replicas where N is usually less than total replicas (N is called Corum)

So in high consistency applications, i.e banking Synchronous replication is used in other scenarios Asynchronous will suffice.

How Replication Work: It can be partial replication means only a particular table is replicated, a Streamed one, which means all the writes are replicated at that time, or in bulk means after some threshold its dumps data to the replicas.

Memory Usage
CPU Usage: Multiple CPUs

Bugs
Canary Deployment, Unit Testing, QA, Regression, Deployment Multi-Region
UI Should handle gracefully

Scaling:

Vertical: when you increase the capacity of existing resources (RAM, CPU, etc)
Horizontal: When you increase the resources themselves (CPUs, Nodes)

CAP Theorem:


C: Consistency; System is always consistent all the time whatever the condition
A: Availability: The system is always available all the time
P: Partition Tolerance: The system still works even when distributed partitions are not working

The theorem states, a system can't have 100% availability and 100% consistency when distributed.
That means we have to choose between Degrees of availability and consistency and choose a trade-off while designing a system.

For example: When banks are not connected a person can withdraw the amount from one branch and do the same from another branch when branches are not connected. He might withdraw more than the amount he has. So, either branch has to connect or he should be able to withdraw only from his account branch.
but for deposits, he can deposit in multiple branches. That way ensures availability over consistency while depositing (updating) but and consistency over availability while withdrawing.
So, the overall system is somewhat available and consistant.


Database Partitioning

Database is partitioned when database can't be scaled more for physical iunstance.
There is horizontal partitioning (row partition of tables) and vertical partitioned (column partition of tables)

Horizantal partitioning is called sharding.

Database Sharding

horizontally partitioning data.
Logicall Sharding: When its sharding according to some logic but it could be on same machine to on different too.
Physical Sharding: When data is physically on other machine.

Sharding is advantageous on other places too other than data load. It can distribute the query load too and since its distributed if one partition goes down you can also serve users from other partitions too whose data is available in other partitions.

Sharding Strategies

Dynamic Sharding: 

When client asks some sharding module or other service for data location.

Directory based Sharding: When you have lookup table to decide to where data should be located in your shards. Usually it's pre-anticipated so data partition is distributed evenly.
Disadvantages of this is; look up table shouldn't be large, always available because without it you can;t know where to read or write data so it becomes single point of failure.

Algorithmic Sharding: 

When a function decides within a clinet where a data has to go in all the partitions.

Key based Sharding: When you use hash function on some key and that decides which partition data has to go or read from. Its fairly consistent and data is evenly partitioned but problem arises when a data partition is added or removed now you have to recalculte the hash and move data around to make it consistant.
Sometimes even without hashing function only key can be used to sharding too. Country/City in some cases. But it should be immutable in nature otherwise you'll have to migrate the data to different shards.

Shard Key could be combination of multiple keys too.

Range based Sharding: When you partition data based on range, may be on months, username start char or even user id or date. Usually this works best when range is used in query often. Like records based on dates if its sharded on month will be efficient because it knows where to search.
Downside could be uneven distribution. and some node may have much more data than others. Those nodes are called hotspots.

Drawbacks For Sharding

When data is not evenly distributed, some are overloaded and some are less.
Combining data is hard; reversing shard is complex.
When a data query is searching through multiple shards; it's is much slower because of overhead of combining the data.
All databases don't provide sharding capabilities.

Hashing

Computation on data to convert it to the number; same data should return same hash every time. So, when we are dividing the load/data between the servers we use a hash of data to decide where the request or data should go. But the problem arises when we increase and decrease the servers, we have to re-distribute the data too.


Consistent Hashing: Hash is also computed for the range as the hash is taken from the data. 
So if S1 could have a hash of 1, S2 could have a hash of 64, etc and when data is being distributed among servers its distributes as (Greater than Hash of S1 and Less than Hash of S2 will go to S2) and between S4 to S1 goes to S1. So, If one of the servers goes missing the range can easily be redistributed for the missing server only, that saves a lot of redistribution and that is the advantage of consistent hashing.

Event-Driven Architecture (Request/Reply)

When you communicate asynchronously with events primarily queues and wait to queue to response. to make queue message idempotent you can add request Id and verify same id when receiving the messages.


Distributed Logging

Its important to be able to log your services and create a context out of it and follow through for one order id or transaction id for what actually was life of some order happened?

Its really neccossory to have one context id for each request for all the services. Then all request could be attached to one hiererachy.

Its also good to use one wrapper for all microservices logging, so in future if we may change the logging system for each service we'll just change on the one place.

Client-Server Communication


HTTP Request: The client can send the request and receives a response. The client sends HTTP request TCP/Handshake
HTTP Polling: The client keeps sending multiple HTTP requests to get updated responses/
HTTP Long Polling: The client sends a request with a long timeout and the server sends a reply when it has a message for the client.
Web Sockets: client makes a handshake/first request to the server and then the server can send a request to the client and vice-versa until the handshake breaks.
its expensive it hijacks the entire HTTP connections
Server sent events: Push Notification: The client initially makes a long-term connection/ sends content-type: event-stream to initiate the connection with the server and then the server keeps sending the response multiple times but the client can send it back. for the client to send back It needs to make an HTTP request. It's not a JSON response it's bite-sized lightweight chunks of data.
It's used in a live feed, push notification, etc
The browser has an in-built EventSource() object that keeps getting requests from the server.
Don't use HTTP 1.1 because it generates a connection 





Wednesday, August 25, 2021

Microservices With .Net Core

 SOA

you structure your application by decomposing it into multiple services (most commonly as HTTP services) that can be classified as different types like subsystems or tiers. Microservices derive from SOA, but SOA is different from microservices architecture. Features like big central brokers, central orchestrators at the organization level, and the Enterprise Service Bus (ESB) are typical in SOA. But in most cases, these are anti-patterns in the microservice community. In fact, some people argue that “The microservice architecture is SOA done right.”

Microservices architecture

As the name implies, a microservices architecture is an approach to building a server application as a set of small services. Each service runs in its own process and communicates with other processes using protocols such as HTTP/HTTPS, WebSockets, or AMQP. Each microservice implements a specific end-to-end domain or business capability within a certain context boundary, and each must be developed autonomously and be deployable independently. Finally, each microservice should own its related domain data model and domain logic (sovereignty and decentralized data management) based on different data storage technologies (SQL, NoSQL) and different programming languages.
As an additional benefit, microservices can scale-out independently. Instead of having a single monolithic application that you must scale out as a unit, you can instead scale-out specific microservices. That way, you can scale just the functional area that needs more processing power or network bandwidth to support demand, rather than scaling out other areas of the application that do not need to be scaled. That means cost savings because you need less hardware.
The microservices approach allows agile changes and rapid iteration of each microservice because you can change specific, small areas of complex, large, and scalable applications.

An important rule for microservices architecture is that each microservice must own its domain data and logic. Just as a full application owns its logic and data, so must each microservice own its logic and data under an autonomous lifecycle, with independent deployment per microservice.

Data consistency is hard in microservices, therefore different services use different data storage i.e SQL, NoSql, or even Graph. Like Pollygot Persistence

A microservices is like a Bounded Context in DDD. It is good to approach to start to define or divide your system accordingly.

Challenge 1: How to define Boundaries Of Microservices.

Usually, Bounded Context is a good idea to start, Each service should have its own context. The same entity could be referred to differently in different contexts i.e user in auth service, the customer in order service. They could have the same data or the same identity with different attributes. 

Challenge #2: How to create queries that retrieve data from several microservices


API Gateway 
CQRS with query/reads tables
“Cold data” in central databases.

Challenge #3: How to achieve consistency across multiple microservices

. No microservice should ever include tables/storage owned by another microservice in its own transactions, not even in direct queries; microservice should use eventual consistency probably based on asynchronous communication such as integration events (message and event-based communication).

Challenge #4: How to design communication across microservice boundaries 

In this context, communication means how coupled your microservices should be. Depending on the level of coupling, when a failure occurs, the impact of that failure on your system will vary significantly

For instance, imagine that your client application makes an HTTP API call to an individual microservice like the Ordering microservice. If the Ordering microservice, in turn, calls additional microservices using HTTP within the same request/response cycle, you’re creating a chain of HTTP calls. It might sound reasonable initially. However, there are important points to consider when going down this path: • Blocking and low performance. Due to the synchronous nature of HTTP, the original request doesn’t get a response until all the internal HTTP calls are finished. Imagine if the number of these cells increases significantly and at the same time one of the intermediate HTTP calls to a microservice is blocked. The result is that performance is impacted, and the overall scalability will be exponentially affected as additional HTTP requests increase. 
• Coupling microservices with HTTP. Business microservices shouldn’t be coupled with other business microservices. Ideally, they shouldn’t “know” about the existence of other microservices. If your application relies on coupling microservices as in the example, achieving autonomy per microservice will be almost impossible. 
• Failure in any one microservice. If you implemented a chain of microservices linked by HTTP calls when any of the microservices fails (and eventually they will fail) the whole chain of microservices will fail. A microservice-based system should be designed to continue to work as well as possible during partial failures. Even if you implement client logic that uses retries with exponential backoff or circuit breaker mechanisms, the more complex the HTTP call chains are, the more complex it is to implement a failure strategy based on HTTP. In fact, if your internal microservices are communicating by creating chains of HTTP requests as described, it could be argued that you have a monolithic application, but one based on HTTP between processes instead of intra-process communication mechanisms.

Therefore, to enforce microservice autonomy and have better resiliency, you should minimize the use of chains of request/response communication across microservices. It’s recommended that you use only asynchronous interaction for inter-microservice communication, either by using asynchronous message- and event-based communication, or by using (asynchronous) HTTP polling independently of the original HTTP request/response cycle. 

API Gateway Pattern

API Gateway or BFF (Backend for Frontend) is a useful pattern in microservices. Instead of exposing all the microservices to the outer world, better to create API Gateway to be used for the outer world. This will act as a facade between client applications and microservices, will prevent too many round trips for SPA/Client app to aggregate data, single side to implement logging, authentication and authorization, caching, IP Whitelisting, reties, circuit breaker and QoS, Headers, Query String and claims transformation, rate liming and throttling and load balancing. but there could be drawbacks of using API Gateways too like a single point of failure, which could be a bottleneck to your system and increased development efforts.

There are many API Gateways in the market i.e Azure Gateway or Ocelot

Upstream URL: URL being requested by outside
DownStream: Local Service

There is n-n relation between up and downstream, which means upstream can have multiple downstream and vice versa

Microservice and Service Registry

Microservices should have a unique name, should be discoverable, It shouldn't be an IP running on some computer (things can go bad). It shouldn't depend on the infrastructure it's running on. For that, there needs to be a service registry if one machine fails but the service is still discoverable through that.

Micro-frontends (Composite UI)

.

Resiliency And High Availability

A microservice should be resilient and available all the time; that means it should restart itself from failure; If it's a faulty version, it should be able to roll back to the stable one. and resilient also so no state loss or data loss should occur. For that, we need to implement patterns like a circuit breaker, retry strategy, exponential bakeoff by using libraries like Polly in .NET Core.

Health management, logging, and diagnostic are also a part of resilience and High availability; there should be standardized health checks, event logging, and tracing. There needs to be crash reporting or machine restart alerts. Logging should be in a standardized output stream like Microsoft.Diagnostic.EventFlow which collects streams from multiple systems and publishes them to output system and Orchestrators should also have diagnostic monitoring i.e Azure, AWS, etc.

Exponential Backoffs

Retry policy when you get a failure from the API and you retry first time in 0 seconds second time in 5 seconds and third time in 25 seconds. 
It doesn't bombard another service to send n amount of retries.

Circuit Breaker Pattern

A pattern that wraps or intercepts the HTTP calls to the service and checks if It returns a success or failure response. if more than n percentage of requests are timing out or returning failure it breaks the circuit and lets the nonresponsive service recover and stops cascading failure in case of timeouts which keeps threads busy. It helps the system fail fast than to reach a state of exhaustion.

It has three states

CLosed: All requests can go through
Open: No calls will go (once it opens a timer will start to make it half-open and it lets some requests in) if still fails It stays open and the timer and timer start again.

Fallback: when API fails we want to define policy, going to backup API if not serving cached response or just failing.

Bulk Head Pattern

In Deadpool, If service calls lots of services and one service is slower than the other that service call is hogging the resources to solve that we allocate resources per service that means each service will have an allocated bandwidth of calls if that exceeds calls at the time of making next call we won't make the next call. unless the previous calls are completed.

Integration Event

An event to publish on a common event bus to be subscribed by any service. In a way, it's decoupled because publisher and subscriber don't know of each other. Any message broker service can be used to achieve that.
    1. Achieving idempotency with events. queue built-in capabilities can be used as RabbitMq add a re-delivered flag to the unacknowledged, redelivered event and assigning an ID to the event and persisting it in the subscribing microservices.

These events can be implemented by Message Broker Like RabbitMq which uses APMQ messaging protocols which means messages go to exchange first and then be delivered to the consumers.
It has multiple modes direct, fanout, topics. Direct will send messages to one queue strictly matching the route key of published to the subscriber, fanout is not strict it will deliver all messages to all the consumers however topic is a middle ground in both; It will match string patterns like wildcard to filter the subscriber.

Each message will need acknowledgment to be deleted from the queues, 


Event Sourcing Pattern

You store only domain events to the database and reply to those to get the current state of the object,. for efficiency purposes, you can also save snapshots of the current state.
Handles multiple concurrent requests on the same data, maintains audit log. Loose consistency but has high scalability.

Docker Container

For each service instance, you use one instance.
Docker images/containers are units of deployment
A container is an instance of a docker image.
A host VM handles many containers.

Transaction Outbox Pattern

Service Mesh

In software architecture, a service mesh is a dedicated infrastructure layer for facilitating service-to-service communications between services or microservices, using a proxy. It helps deploy new code without disturbing the old one, migrates traffic i.e canary deployment. service discovery could be outsourced to mesh, load balancing, and also fault tolerance could also be handled by mesh.

AWS Lambda

Stored in AWS Storage, when the trigger (defined in the function) occurs, the function is deployed to compute and starts execution. It deploys one function instance per request at the time, it scales automatically and destructs automatically too. it gets triggered by function and also can trigger other functions too.


Testing

Unit Testing

Integratiiion Testing

Functional Testing







Tuesday, August 24, 2021

Interview Preperation

Big O Notations

Programming Paradigms

OS

Race Condition

When two threads try to share resources and whoever gets first other gets bad data like getting balance before withdrawal 

Deadlock

When two threads are waiting for the other to release the resource. now both threads are deadlocked.

thread A needs a Db connection which thread B has acquired lock on it and Thread B needs a File which A has locked. Now both be waiting for each other to release the locks.

Starvation

A thread is not able to acquire a resource that it wants to process because it is acquired by threads that have higher priority.

Mutex (mutual exclusive)

Semaphore

Live Lock

Livelock is a situation where a request for an exclusive lock is denied repeatedly, as many overlapping shared locks keep on interfering each other. The processes keep on changing their status, which further prevents them from completing the task. This further prevents them from completing the task.

Asynchronous 

When a program sends any request to the lower-level control, HTTP, file read or database, etc It doesn't wait for it to finish but the thread continues to do its work. it doesn't block anything.

Multiprocessing 

Spin up multiple processes with their own memory and resources and divide tasks among them 

Multithreading

Threads are part of a process that has access to the same memory

Cooperative multitasking vs Preemptive multitasking

Preemptive scheduling has to solve a hard problem -- getting all kinds of software from all kinds of places to efficiently share a CPU.

Cooperative scheduling solves a much simpler problem -- allowing CPU sharing among programs that are designed to work together.

So cooperative scheduling is cheaper and easier when you can get away with it. The key thing about small devices that allows cooperative scheduling to work is that all the software comes from one vendor and all the programs can be designed to work together.


Performance Optimization

Latency: The time to completion of a task. Measured in time units.

Throughput: The number of tasks completed in a given period. Measured in tasks/time unit, typical seconds.


SOLID PRINCIPLES

OOP

Interfaces vs Abstract classes with C# 8

  • The interface is like a contract; you define to expose functionality however abstract class is an abstraction of some functionality or implementation. 
  • Interfaces can be multiply inherited and also structs can implement it but abstract class behaves like a class
  • The conceptually derived class is a specialization of parent class but interfaces represent a common contract of the functionality. ConsoleLogger will be common functionality used, enhanced or extended version of BaseLogger but ConsoleLogger will implement ILogger defined functions.
  • Interface Implementation is default implementation that means you can either use default implementation or override it but the abstract class's functionality is extendable.
  • Multiple interfaces could be implemented and relation shouldn't be necessary "is a" but single base class exists and relation should be "is a" with the base class.
  • avoid diamond problem, please 

Design Patterns

Database

RDMS: Relational database management system
Data Integrity: All data should be correct in all the ways we retrieve it. Preventing reduant data.
    Entity Integrity: Data should be there as it as suppose to be
    Domain Integrity: Column should have data which is expected.
    Relattion Integrity: relation should be integral 
Anomolies: Error in data integrity.

DDL: Data definition language
DML: Data Modification language

Atomic Values: Each column should have atomic values, one value in the column

Relationships

  • One to one: Either we keep column within same table or define foreign key in other table unique and not nullabke
  • One to many:  Foreign Key with in other table 
  • Many to many: A junction table with both tables ids in it.
Keys:

Super key: any number of columns ensure the uniqueness in the table
Candidate key: least number of columns to ensure uniqueness (Primary Key)
Foreign Key: Primary key of another table 
Surrogate Key; UserID, StoreId which has no real meaning in domain its surrogate key as primary key
Natural Key: A domain key as primary key
Simple Key: One column key
Composite/Compound Key: Multiple column key in the database

Modeling

ERD: Entity Relationship Diagram

Indexes

A clustered index is created when we defined PK on any column. Data is stored in a tree format and sorted by clustered index.

A non-cluster key is mapped as clustered-value in a tree. Once it finds the cluster key from the tree. now it can easily search on real data with a cluster key.





Operating Systems

.NET

Fundamentals

  • .NET apps run managed code in a runtime environment known as the Common Language Runtime (CLR).
  • The .NET CLR is a cross-platform runtime that includes support for Windows, macOS, and Linux. The CLR handles memory allocation and management. The CLR is also a virtual machine that not only executes apps but also generates and compiles code using a just-in-time (JIT) compiler.
  • Higher-level .NET languages, such as C#, compile down to a hardware-agnostic instruction set, which is called Intermediate Language (IL). When an app runs, the JIT compiler translates IL to machine code that the processor understands. JIT compilation happens on the same machine that the code is going to run on. So, On run time .net compiles the app and optimizes it too.
  • The default experience for most .NET workloads is the JIT compiler, but .NET offers two forms of ahead-of-time (AOT) compilation: iOS needs full AOT and ReadyToRun where dotnet produces ready to run if you describe ReadyToRun to True, OS and Processor too.

Working with unmanaged resources

Sometimes code needs to reference unmanaged resources. Unmanaged resources are resources that aren't automatically maintained by the .NET runtime. For example, a file handle is an unmanaged resource. A FileStream object is a managed object, but it references a file handle, which is unmanaged. When you're done using the FileStream, you need to explicitly release the file handle.

In .NET, objects that reference unmanaged resources implement the IDisposable interface. When you're done using the object, you call the object's Dispose() method, which is responsible for releasing any unmanaged resources. The .NET languages provide a convenient using statement (C#F#VB) that ensures the Dispose method is called.


  • Source code (.cs) files are compiled into MSIL (Microsoft Intermediate Language) also known as CIL (Common Intermediate Language) or IL (Intermediate Language) along with its metadata. MSIL is machine independent.
  • Metadata includes programming language, environment, version, and class libraries  types, members, and references in your code.
  • CLR contains JIT(Just In Time) compiler which converts MSIL to machine code to be executed on current machine.
  • The MSIL code is only the managed code which CLR manages; everything else is called unmanaged code outside of scope of CLR (win32 apis, COM components) managed by OS.
  • Common Type System (CTS) understands all the language types and translates it to the CLR. Reference types; allocates memory in the runtime and into the heap and Value Types; stored in the stack and memory is allocated at compile time.
  • CLR helps common execution environment for more than 60 .Net Languages, memory management i.e garbage collection, sharing code and libraries across the languages, 
  • The Common Language Runtime in the .NET Framework is the Virtual Machine component that handles program execution for various languages such as C#, F#, Visual Basic .NET, etc. The managed execution environment is provided by giving various services such as memory management, security handling, exception handling, garbage collection, thread management, etc.
For more details refer geekscode, msdn, msdn terms

What is Heap and what is Stack?

  • Both are memory locations, wherein Heap is global and Stack is local.
  • The Heap is application-level while the Stack is thread-level.
  • The Stack has a defined first-in-first-out stack structure, while the Heap does not have a defined data structure.
  • Its size is defined at the time of its creation. For Heap, the size is defined when starting the application and for Stack, when creating a thread.
  • Both can grow dynamically.
  • The Stack is faster than the Heap. A stack is in “cache” and doesn’t have to synchronize with other threads like the Heap.
  • The Stack stores values while the Heap stores objects.
Dotnet used managed heap, which is different from the heap OS provided. Managed heap is allocated on start of the application.

Roslyn (Library, API): Compiles, Provide Diagnostic, Analyze Code. 

Compile Time Vs Run Time

C#

Types

Reference types are stored in managed head and value types are stored in stack.

Value Types

  • Value types are sealed
  • All primitive types, enums and structs are value types.

Reference types

  • A type that is defined as a classdelegate, array, or interface is a reference type
  • On Declaration they have value of null by default

Nullable types

Ordinary value types can't have a value of null. However, you can create nullable value types by appending a ? after the type. For example, int? is an int type that can also have the value null. Nullable value types are instances of the generic struct type System.Nullable<T>. Nullable value types are especially useful when you're passing data to and from databases in which numeric values might be null. For more information, see Nullable value types.



Indexers (C# Programming Guide)

Indexers allow instances of a class or struct to be indexed just like arrays. The indexed value can be set or retrieved without explicitly specifying a type or instance member. Indexers resemble properties except that their accessors take parameters.

using System; class SampleCollection<T> { // Declare an array to store the data elements. private T[] arr = new T[100]; // Define the indexer to allow client code to use [] notation. public T this[int i] { get { return arr[i]; } set { arr[i] = value; } } } class Program { static void Main() { var stringCollection = new SampleCollection<string>(); stringCollection[0] = "Hello, World"; Console.WriteLine(stringCollection[0]); } }

Entity Framework Core

Entity Framework (EF) Core is an open source and cross-platform data-access technology that can serve as an ORM. EF Core lets you work with a database by referring to .NET objects in code. It reduces the amount of data-access code you would otherwise need to write and test. EF Core supports many database engines.

For more information, see Entity Framework Core and Database Providers.

LINQ

Language-integrated query (LINQ) lets you write declarative code for operating on data. The data can be in many forms (such as in-memory objects, a SQL database, or an XML document), but the LINQ code you write typically doesn't differ by data source.

For more information, see LINQ (Language Integrated Query) overview.

ASP.NET

WebAPI

SOAP: 

soap has built-in retry logic, has WS security built-in, has WSDL file for function definition

Rest API: A Design Approach


HTTP communication between software, it has verbs to communicate what to do, end points to identify where and headers to communicate more information 

Verbs

GET, POST, PUT, PATCH, DELETE, COPY, HEAD, OPTIONS, LINK, UNLINK, PURGE, LOCK, UNLOCK, PROFIND VIEW

Best Practises

0. Stateless
1. JSON Data format
2. Use nouns (not GetUser but Users and use Post endpoint, not delete book instead books with verb delete)
3. User plurals (Users, Books)
4. Use HTTP Error Codes (500, 200, 404, 201)
5. Paging, Filtering, Sorting of getting Collection
6. versioning localhost/v1/orders
7. Docs (Swager)
8. Using SSL, TLS (HTTPS) 


Throating

Rate limiting: Number of requests per user in a given time
IP Level Throttling: No. of request per IP
Concurrent Level: no. of concurrent request per client
Resource Level: DB/Particular resource throttling

Authentication/Authorization

Session Based Authentication usually implemented as cookie in browser. client sends key for the particular session which server has generated when client provides its credentials. Invalidate session when logs out
State based session: server has to remember the sessions that means redis cache, or some memory to remember this.

Basic Authentication: Client needs to send username and password everytime. Its stateless
In the Authorization Headers: username:password, and encode base64 
You send Basic <encoded value of username:password>
The reason we are encoding because if there is non-http compitable it should be encoded.

Digest Access Authentication: Encrypted Token

Asymetric Cryptograpgy Authentication: Encrypted Token can only be decrypted by server

OAuth

Standard to grants, authenticate multiple clients etc Imlimenr single sign on, multiple websites to login

JWT (Json web tokens)
Signed token with all the clients information; when customer returns we verify sign.

Angular

Ajax

Javascript

HTTP

CSS

HTML


Microservices

Event Driven Archetecture

Cloud Archetecture


TDD

Unit Test

Integration Test

Functional Test

BDD

Cache

Queues



Version Control

CI/CD

MSBuild and the .NET CLI can be used with various continuous integration tools and environments, such as:

Azure
Docker
Encryption and Hashing
Idempotency

Agile Programming
Encryption

System Design Concept

1. Load-balancing 2. Caching 3. Database schema design 4. Slave-master replications 5. Database sharding 6. API design