<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Raghav's Blog]]></title><description><![CDATA[Raghav's Blog]]></description><link>https://blog.raghavdev.in</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 14:54:05 GMT</lastBuildDate><atom:link href="https://blog.raghavdev.in/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Understanding the Core Concepts of AI: Part 1]]></title><description><![CDATA[I've been working as a software engineer at a startup for quite some time, and now I'm excited to move into the AI field. There are so many topics to explore, and it can feel overwhelming. To make it easier, I started by learning the basics of Large ...]]></description><link>https://blog.raghavdev.in/understanding-the-core-concepts-of-ai-part-1</link><guid isPermaLink="true">https://blog.raghavdev.in/understanding-the-core-concepts-of-ai-part-1</guid><category><![CDATA[AI]]></category><category><![CDATA[large language models]]></category><category><![CDATA[neural networks]]></category><category><![CDATA[attention-mechanism]]></category><category><![CDATA[Tokenization]]></category><category><![CDATA[vector embeddings]]></category><category><![CDATA[Attention Is All You Need]]></category><category><![CDATA[Supervised learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[transformers]]></category><dc:creator><![CDATA[Raghav  Shukla]]></dc:creator><pubDate>Wed, 26 Nov 2025 11:41:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/fvxNerA8uk0/upload/1ecc541adaf4734ec7b27e0581244868.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've been working as a software engineer at a startup for quite some time, and now I'm excited to move into the AI field. There are so many topics to explore, and it can feel overwhelming. To make it easier, I started by learning the basics of Large Language Models to understand how they work. I found a lot of interesting topics, so I decided to write a series of blog posts about them. This series will cover the key <strong>Building Blocks of AI</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764149371957/70352bcb-0a57-4c3a-9e53-b1b661938ef8.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-large-language-model">Large Language Model</h2>
<p>A <strong>Large Language Model (LLM)</strong> is a large <strong>Neural Network</strong> made up of many <strong>transformer layers</strong>. It is trained to predict the next <strong>token</strong> in a sequence of input. The model breaks down the user input into tokens and represents them in <strong>vector format</strong>. Each transformer layer has multiple sub-layers, allowing each token to compare itself mathematically to all other words. This process is repeated thousands of times, and eventually, the model generates a <strong>probability distribution</strong> for the next token.</p>
<p>For example, if we type “All that glitters is not …” into <strong>Chat GPT</strong> or <strong>Gemini</strong>, it predicts “gold.” Another example is if we ask a well-read person about a book related to a ship sinking, they will immediately suggest the <strong>Titanic</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764126686927/0c02559c-671e-4cdb-bb43-f9856efec70a.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-tokenization">Tokenization</h2>
<p>User text → <strong>Tokenizer</strong> → <strong>Token IDs</strong></p>
<p>As mentioned earlier, the user's query is broken down into smaller parts (<strong>tokens</strong>) that AI can understand. This process is called <strong>tokenization</strong>. For example, if the user writes "All that glitters," the <strong>LLM</strong> can split it into tokens like “All,” “the,” “glit,” “ers.” Another example includes "eating," "dancing," "singing." Tokens are not words; they are <strong>IDs</strong> that represent pieces of text. A <strong>Neural Network</strong> cannot process raw characters.</p>
<p>Tokenization is important because words can vary a lot: "run," "running," "runners" are all different words but have similar meanings. It creates a fixed-size <strong>vocabulary</strong> that can represent any text. The final query might return something like [72, 1632, 9872, 3123, …], which are then sent to the <strong>embedding layer</strong>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764127214570/dcf33ac4-6eb7-42b6-ad55-934c1089aac8.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-vectorization">Vectorization</h2>
<p>Text → <strong>Tokens</strong> → <strong>Token IDs</strong> → <strong>Vectors (Embeddings)</strong> → <strong>Transformer Layers</strong></p>
<p>The token IDs are fed into the <strong>embedding layer</strong>, which converts them into high-dimensional <strong>vectors</strong>. These vectors then pass through the <strong>attention layer</strong>, <strong>feed-forward layers</strong>, and <strong>transformer layers</strong>.</p>
<p>Words with similar meanings are placed close to each other. For example, "happy" and "joy" are positioned mathematically near each other. This vector represents the token’s meaning, context, and relationship with other tokens.</p>
<p>This process is essential because a <strong>Neural Network</strong> requires continuous values (floating-point numbers) to learn, and the meanings should be mathematically compressed. "Run" and "jog" should be close together, while "run" and "sofa" should be far apart. Vectors are learned during training and are not calculated by a formula.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764129405624/b2991e8d-a882-4f76-bd21-51683a7c9463.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-attention">Attention</h2>
<p>Tokens → <strong>Token IDs</strong> → <strong>Vectors</strong> → <strong>Transformer Layer</strong> → <strong>Attention</strong> → <strong>Feed-forward</strong> → Next layer → Repeat</p>
<p><strong>Attention</strong> is a mathematical tool that allows each token to determine which other tokens are important and to what extent. It examines nearby tokens to clear up any confusion and calculates how much "attention" one token should give to others. For example, in the phrase “Apple’s Revenue,” the model focuses on “revenue” to understand that Apple refers to the company, not the fruit. This mechanism aids in understanding context.</p>
<p>The <strong>LLMs</strong> we see today exist because of the <strong>Attention</strong> mechanism discussed in the well-known paper <a target="_blank" href="https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"><strong>Attention is All You Need</strong></a> by Google engineers in 2017. Before Attention was introduced, LLMs relied on <strong>Recurrent Neural Networks (RNNs)</strong> and <strong>Long Short-Term Memory (LSTM)</strong>, which processed information from left to right and often lost the context from earlier tokens.</p>
<p>For example, in the sentence “The dog that chased the cat was hungry,” to understand "was hungry," the model needs to connect it back to "dog." RNNs had difficulty with this. Thanks to Attention, even if two words are 10,000 tokens apart, they can be linked, and it processes all tokens simultaneously, making it very fast. Nothing in LLMs functions without Attention; it is the core engine of intelligence.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764144832331/0d638e0f-cc90-48de-9ce9-63c5824e3014.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-self-supervised-learning">Self-Supervised Learning</h2>
<p>Vectors → <strong>Transformer</strong> → Predict next token → Compute <strong>loss</strong> → <strong>Backpropagation</strong> → Update <strong>weights</strong></p>
<p><strong>Self-supervised learning</strong> is a training method where the model learns from unlabeled data by creating its own training labels, instead of relying on humans to label everything manually. It hides parts of the data and tries to guess what’s missing. For example, if the sentence is “the sky is ___” and the model answers "red," this is considered a loss, and the weight for this prediction is reduced. When it answers "blue," it is rewarded.</p>
<p>Each time it predicts the next token, <strong>embeddings</strong> become more refined, <strong>attention weights</strong> are adjusted, and the multilayer representation improves. Gradually, the model learns that “cat” often appears near “fur,” “pet,” “animal,” so it creates the vector embedding accordingly. This intelligence comes from compressing patterns; <strong>Self-Supervision</strong> is essentially a large pattern compression engine.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764146099091/35b568a6-19fa-4894-86ef-bc3e245b8f6e.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-transformer">Transformer</h2>
<p><strong>Transformer</strong> is an architecture used in modern <strong>LLMs</strong> that employs the <strong>Attention</strong> mechanism to process all tokens simultaneously through <strong>self-attention</strong> and <strong>feed-forward networks</strong>. This forms the foundation of all modern LLMs. The “T” in <strong>GPT</strong> stands for Transformer.</p>
<p>Each layer of a transformer consists of two main components:</p>
<ul>
<li><p><strong>Multi-Head Self-Attention</strong>: Generates <strong>Q</strong>, <strong>K</strong>, <strong>V</strong> vectors and calculates relevance.</p>
</li>
<li><p><strong>Feed-Forward Neural Network (FFNN)</strong>: Once attention provides context, the FFNN further transforms the vector.</p>
</li>
</ul>
<p>In addition to these two components, each transformer layer also includes normalization layers and residual connections. These ensure that information flows smoothly through very deep networks without vanishing or exploding, allowing transformers to scale to hundreds of layers. Residual connections help the model “remember” the original input signal while still applying complex transformations. Layer normalization stabilizes training and improves convergence, making transformers far more efficient and scalable than older architectures like RNNs or LSTMs.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764146190811/ff246250-ab48-405e-a07a-b9a72ac485e4.png" alt class="image--center mx-auto" /></p>
<p>This is the first step in my deep dive into how modern AI systems actually work. I’ll keep exploring the remaining building blocks: training, inference, optimizers, quantization, fine-tuning, and more and publish them as the next parts of this series. If you want to follow the full breakdown end-to-end, the upcoming posts will connect everything together.</p>
]]></content:encoded></item><item><title><![CDATA[System Design 101 : Scale from Zero to Millions of Users]]></title><description><![CDATA[In the past three years, I’ve mostly worked with startups, and the common approach was always “build fast, optimize later.” It definitely helped us move quickly, but when the user base started growing, we ran into scaling issues that were tough to ha...]]></description><link>https://blog.raghavdev.in/system-design-101-scale-from-zero-to-millions-of-users</link><guid isPermaLink="true">https://blog.raghavdev.in/system-design-101-scale-from-zero-to-millions-of-users</guid><category><![CDATA[System Design]]></category><category><![CDATA[System Architecture]]></category><category><![CDATA[serverless]]></category><category><![CDATA[server]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Load Balancing]]></category><category><![CDATA[data]]></category><category><![CDATA[caching]]></category><category><![CDATA[cache]]></category><category><![CDATA[CDN]]></category><category><![CDATA[architecture]]></category><category><![CDATA[Data Center]]></category><category><![CDATA[message queue]]></category><category><![CDATA[sharding]]></category><dc:creator><![CDATA[Raghav  Shukla]]></dc:creator><pubDate>Tue, 26 Aug 2025 17:58:02 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/gcgves5H_Ac/upload/1d7b69a6181b646707ebfc6969973cf7.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the past three years, I’ve mostly worked with startups, and the common approach was always <em>“build fast,</em> optimize <em>later.”</em> It definitely helped us move quickly, but when the user base started growing, we ran into scaling issues that were tough to handle in production.</p>
<p>That’s when I realized—it’s not about choosing between speed and scale. The real trick is finding a balance: shipping fast while also putting just enough foundation in place so the system can handle growth. After all, every startup dreams of reaching a million users, and being a little prepared early makes the ride much smoother.</p>
<p>Recently, while learning more about system design and reading the <em>ByteByteGo</em> book, I picked up a few lessons that I think are worth sharing.</p>
<h2 id="heading-single-server-setup">Single Server Setup</h2>
<p>To keep things simple, we’ll start by running everything on a <strong>single server</strong>. When a <strong>user</strong> accesses the website by entering a <strong>domain name</strong> like <a target="_blank" href="http://raghavdev.in"><strong>raghavdev.in</strong></a>, the <strong>DNS server</strong> returns the corresponding <strong>IP address</strong>. Once the IP is received, the browser sends an <strong>HTTP request</strong> to our <strong>server</strong>, which then responds with either <strong>HTML</strong> or <strong>JSON</strong> for rendering. This forms the <strong>initial setup</strong> of our system.<br />This is how the initial setup will look like.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-2-GPY73ZNO.png&amp;w=3840&amp;q=75" alt="Image represents a simplified client-server architecture diagram illustrating the process of accessing a website.  A user, represented by a web browser and a mobile app icons within a rounded rectangle labeled 'User,' initiates the process.  The user's device first sends a request (1) to a DNS server with the domain name 'api.mysite.com'. The DNS server responds (2) with the IP address '15.125.23.214'.  The user's device then uses this IP address (3) to connect to a web server, depicted as a green rectangle labeled 'Web server' within a dashed-line box.  The web server sends back an HTML page (4) to the user's device, completing the request.  The arrows indicate the direction of information flow, showing the request and response between the user's device, the DNS server, and the web server.  Numbers in circles correspond to the steps in the process." /></p>
<h2 id="heading-database">Database</h2>
<p>As we discussed earlier, the <strong>single server setup</strong> works well in the beginning, but it quickly becomes a limitation once the <strong>user base</strong> starts growing. To scale more effectively, we need to <strong>separate the web server and the database</strong> so that each can grow independently.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-3-2P4MNG7C.png&amp;w=3840&amp;q=75" alt="Image represents a simplified system architecture diagram showing the interaction between a user, a website, and a database.  A user, accessing via either a web browser or a mobile app, initiates a request to . This domain name is resolved to an IP address via a DNS (Domain Name System) server. The request then reaches a web server, labeled as such, which handles  requests.  Simultaneously, the mobile app makes a request to , which also points to the same web server. The web server acts as an intermediary, sending  requests to a database (labeled 'DB') and receiving 'return data' in response.  The dashed lines around the web server and database suggest these are separate components or services.  The overall flow depicts a typical client-server architecture with a database backend." /></p>
<p>Now big question arises</p>
<h3 id="heading-which-database-to-choose"><strong>Which Database to choose?</strong></h3>
<p>Basically there are 2 types of database :</p>
<ul>
<li><p><strong>SQL (Relational Databases)</strong> → Store data in <strong>tables</strong> with <strong>rows and columns</strong>. Popular examples are <strong>MySQL</strong> and <strong>PostgreSQL</strong>.</p>
</li>
<li><p><strong>NoSQL (Non-Relational Databases)</strong> → Store data in more flexible formats like <strong>key-value pairs</strong>, <strong>documents</strong>, <strong>graphs</strong>, or <strong>columns</strong>. Examples include <strong>MongoDB</strong>, <strong>Cassandra</strong>, <strong>Neo4j</strong>, <strong>CouchDB</strong>, and <strong>Amazon DynamoDB</strong>.</p>
<p>  The right choice depends on the <strong>project’s requirements</strong>, and in many cases, teams even combine <strong>multiple databases</strong> to meet different use cases.</p>
</li>
</ul>
<h2 id="heading-load-balancer">Load Balancer</h2>
<p>Now that the <strong>web server</strong> and <strong>database</strong> are separated, a new problem arises when many <strong>users</strong> access the website at the same time. Once the <strong>server limit</strong> is reached, responses slow down, which is something we don’t want. To handle this, we add a <strong>Load Balancer</strong> that evenly distributes incoming traffic across multiple servers.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-4-2EGRRANZ.png&amp;w=3840&amp;q=75" alt="Image represents a simplified client-server architecture with load balancing.  A user, accessing via either a web browser or mobile app, initiates a request to . This request first goes to a DNS server, which resolves the domain name  to its corresponding public IP address, .  This IP address points to a load balancer, which receives the request from the user over the public IP. The load balancer then forwards the request to one of two servers (Server1 or Server2) using their private IP addresses ( and  respectively), distributing the load between them.  A table shows the domain name and its associated IP address mapping used by the DNS server.  The two servers are grouped within a dashed-line box, visually representing their internal network.  The arrows indicate the direction of information flow." /></p>
<p>Now the <strong>users</strong> connect only to the <strong>Load Balancer</strong> (public IP) and not directly to the servers. It communicates with the web servers through <strong>private IPs</strong> within the same network, making the servers unreachable from outside.</p>
<p>A <strong>Load Balancer</strong> also prevents downtime if one server goes offline. If <strong>Server 1</strong> fails, all requests are automatically redirected to <strong>Server 2</strong>, ensuring users don’t face interruptions.As traffic grows, we can keep adding more servers, and the load balancer will use its algorithm to <strong>distribute requests</strong> evenly among them. This makes the system both <strong>scalable</strong> and <strong>highly available</strong>.</p>
<h2 id="heading-database-replication">Database Replication</h2>
<p>With the <strong>Load Balancer</strong> in place, we solved the issue of servers going offline. But what if the same thing happens to our <strong>database</strong>? A single database failure would still bring the whole system down, which we need to avoid.</p>
<p>To keep the <strong>database</strong> safe, we can use <strong>Database Replication</strong>, where databases follow a <strong>master–slave relationship</strong>. The <strong>Master DB</strong> holds the original data and handles all <strong>write operations</strong> (insert, update, delete), while the <strong>Slave DBs</strong> maintain copies of that data and serve <strong>read operations</strong>. Since most systems perform more reads than writes, the number of <strong>slave databases</strong> is usually greater than the number of masters.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-6-L2YNDDKF.png&amp;w=3840&amp;q=75" alt="Image represents a simplified web application architecture.  A user, accessing via a web browser or mobile app, initiates a request to . This domain name is resolved to an IP address via a DNS server. The request then reaches a load balancer, which distributes traffic across two web servers ( and ) labeled as the 'Web tier'.  These servers communicate with a database system ('Data tier') consisting of a master database () and a slave database ().  The web servers send write requests to the master database and read requests to either the master or slave database. The master database replicates data to the slave database, ensuring data consistency and redundancy.  The load balancer uses  as the internal endpoint for communication with the web servers.  The entire architecture is visually divided into three tiers: the user tier (user and their access methods), the web tier (load balancer and web servers), and the data tier (master and slave databases).  The arrows indicate the flow of requests and data between components, with labels like 'Write,' 'Read,' and 'Replicate' clarifying the type of interaction." /></p>
<p>As we discussed earlier, the <strong>Load Balancer</strong> improves system availability for servers, and replication does the same for databases. If a <strong>Slave DB</strong> goes offline, the reads can temporarily go to the master, and if the <strong>Master DB</strong> fails, one of the slaves can be promoted to master so the system continues to run without downtime.</p>
<h2 id="heading-cache">Cache</h2>
<p>By now we’ve achieved good <strong>availability</strong>, but the next challenge is improving <strong>response time</strong>. To do this, we add a <strong>caching layer</strong> on top of the database. A <strong>cache</strong> is a high-speed storage layer that keeps frequently accessed or recently used data so it can be served much faster than fetching it directly from the database, disk, or an external API.</p>
<p><img src="https://bytebytego.com/images/courses/system-design-interview/scale-from-zero-to-millions-of-users/figure-1-7-GGNXNZX6.svg" alt="Image represents a simplified system architecture illustrating data retrieval from a cache and database.  The diagram shows three main components: a green rectangular 'Web server,' a blue square labeled 'CACHE' representing a cache, and a blue cylindrical 'Database' component labeled 'DB.'  A green arrow connects the cache to the web server, labeled '1. If data exists in cache, read data from cache,' indicating data flows from the cache to the web server. A blue arrow connects the database to the cache, labeled '2.1 If data doesn't exist in cache,...,' showing data retrieval from the database to the cache when the data is not found in the cache. Another blue arrow connects the cache back to the web server, labeled '2.2 Return data to the web server,' indicating the data's return path to the web server after being fetched either from the cache or the database.  The overall flow depicts a common caching strategy where the web server first checks the cache; if the data is present, it's directly returned; otherwise, the database is queried, the data is retrieved, stored in the cache, and then returned to the web server." /></p>
<p>When a request comes in, the <strong>web server</strong> first checks the cache. If the data is found, it’s returned immediately; if not, the server fetches it from the <strong>database</strong>, stores it in the cache, and then sends it to the client. This is called a <strong>read-through cache</strong>. Depending on the use case, different caching strategies can be applied, and we can store things like <strong>JSON responses</strong>, <strong>JS files</strong>, or other static content to speed up performance.</p>
<h3 id="heading-when-to-use-caching">When to use caching ?</h3>
<ul>
<li><p><strong>Frequent Reads</strong> – The same data is requested repeatedly, so caching avoids recomputing or re-fetching it.</p>
</li>
<li><p><strong>Slow Data Source</strong> – The original source (like a database, disk, or external API) is slower than memory, so caching speeds things up.</p>
</li>
<li><p><strong>High Latency</strong> – If accessing the main source takes noticeable time, caching reduces response times.</p>
</li>
<li><p><strong>Performance Improvement</strong> – By serving data from cache, you reduce load on the server or database.</p>
</li>
<li><p><strong>Cost Efficiency</strong> – If data queries or API calls are expensive, caching lowers usage and costs.</p>
</li>
</ul>
<h2 id="heading-content-delivery-network-cdn"><strong>Content delivery network (CDN)</strong></h2>
<p>A <strong>CDN</strong> is a global network of servers that <strong>cache and deliver static content</strong> like images, CSS, JS, or videos. Instead of always fetching files from the origin server, users get them from the <strong>nearest CDN server</strong>, which makes websites load much faster. For example, a user in <strong>Delhi</strong> will get content quicker from a <strong>Mumbai CDN server</strong> than from one in the US.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-10-E6HDAMPH.png&amp;w=3840&amp;q=75" alt="Image represents a system architecture illustrating how a Content Delivery Network (CDN) serves images to users.  Two users, labeled 'User A' and 'User B,' are depicted as laptops.  Each user requests an image ('image.png') from the CDN, represented as a light-blue cloud with a lightning bolt symbolizing speed. Solid arrows indicate the requests (labeled '1. get image.png' and '5. get image.png') and responses ('4. return image.png' and '6. return image.png') between the users and the CDN.  If the CDN doesn't have the image, dashed arrows show a request ('2. if not in CDN, get image.png from server') to a green rectangular 'Server' component, which then sends the image to the CDN ('3. store image.png in CDN').  This ensures that subsequent requests for the same image from other users are served quickly from the CDN's cache, improving performance and reducing load on the server." /></p>
<ul>
<li><p><strong>User Request</strong> – User A requests an image via a CDN URL (e.g., CloudFront or Akamai).</p>
</li>
<li><p><strong>Cache Miss</strong> – If the CDN doesn’t have it, it fetches the file from the origin server/storage (e.g., Amazon S3).</p>
</li>
<li><p><strong>Origin Response</strong> – The origin sends the file with a TTL (how long it should stay cached).</p>
</li>
<li><p><strong>CDN Cache</strong> – The CDN stores the file and delivers it to User A.</p>
</li>
<li><p><strong>Another Request</strong> – User B requests the same image.</p>
</li>
<li><p><strong>Cache Hit</strong> – The CDN serves the image directly from its cache (until TTL expires).</p>
</li>
</ul>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-11-VI5Z74Q2.png&amp;w=3840&amp;q=75" alt="Image represents a system architecture diagram illustrating a typical web application deployment.  The diagram starts with a user accessing the application via a web browser or mobile app, which then sends a request to a DNS server.  The DNS resolves the domain names (www.mysite.com and api.mysite.com) and directs the request to a load balancer. The load balancer distributes traffic across two web servers (Server1 and Server2) within a 'Web tier'. These servers communicate with a database system in a 'Data tier', consisting of a master database (Master DB) and a slave database (Slave DB) with replication occurring from the master to the slave.  Both web servers also connect to a separate cache (labeled 'CACHE') for improved performance.  The entire system is connected to a CDN (Content Delivery Network) for faster content delivery to users globally.  Solid lines represent primary data flow, while dashed lines indicate secondary or replicated data flow.  Green lines highlight the connection between the web servers and the cache." /></p>
<h2 id="heading-stateless-architecture">Stateless Architecture</h2>
<p>In a <strong>stateful architecture</strong>, the server remembers <strong>client data</strong> from one request to the next. For example, in an <strong>online banking system</strong>, the server keeps track of your <strong>session</strong>—like login details, account info, and transactions—across multiple steps. While in stateless architecture the HTTP request can be shared to any of the server and it will not maintain the cleint data.</p>
<p>By moving <strong>state data</strong> out of the web servers, we make <strong>auto-scaling</strong> much easier. Now, servers can be added or removed based on <strong>traffic load</strong> without worrying about losing session data, making the system more <strong>flexible and scalable</strong>.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-14-CCBCQMO6.png&amp;w=3840&amp;q=75" alt="Image represents a system architecture diagram for a web application.  Users access the application via web browsers or mobile apps, initially resolving  (for web) or  (for mobile) through a DNS server.  These requests are then routed to a CDN (Content Delivery Network) for faster content delivery.  The requests subsequently reach a load balancer, distributing traffic across four application servers (Server1-Server4) which are auto-scaled (indicated by '① Auto scale').  These servers connect to a database system consisting of a master database and two slave databases, with replication occurring between the master and slaves.  Additionally, the servers interact with a cache for improved performance and a NoSQL database, likely for specific data storage needs.  Connections between the servers and databases are shown as dashed lines, suggesting asynchronous communication.  The green line indicates a connection from Server3 to the cache, while the purple line shows a connection from Server3 to the NoSQL database.  The blue lines represent the main flow of requests and data." /></p>
<h2 id="heading-data-centers">Data Centers</h2>
<p>As the <strong>user base</strong> grows globally, a single server location is no longer enough. To reduce <strong>latency</strong>, we add multiple <strong>data centers</strong> around the world. Suppose a website has <strong>Data Center 1 in Mumbai</strong> and <strong>Data Center 2 in New York</strong>.</p>
<ul>
<li><p>A user in Delhi is routed to the Mumbai data center → faster response.</p>
</li>
<li><p>A user in San Francisco is routed to the New York data center.</p>
</li>
</ul>
<p>When a <strong>user request</strong> comes in, it flows through <strong>DNS</strong>, then to the nearest <strong>CDN</strong>, and finally reaches the <strong>Load Balancer</strong>, which uses <strong>geo-routing</strong> to direct it to the closest data center. Inside, the web servers work with <strong>caches</strong> and <strong>databases</strong> to serve the response. If one data center fails, traffic is automatically rerouted to a healthy one, while <strong>data synchronization</strong> ensures consistency across all centers.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-15-GICUI26J.png&amp;w=3840&amp;q=75" alt="Image represents a system architecture diagram for a website.  A user, accessing via web browser (www.mysite.com) or mobile app (api.mysite.com), initiates a request that first resolves through a DNS server.  The request then proceeds to a CDN (Content Delivery Network) for caching and faster delivery.  From the CDN, the request hits a load balancer, which distributes traffic across two geographically separate data centers (DC1: US-East and DC2: US-West) based on geo-routing. Each data center contains multiple web servers, which in turn access databases and caches for data retrieval.  The web servers are connected to their respective databases and caches.  Additionally, both data centers' web servers connect to a central NoSQL database via thick purple lines, suggesting a shared data layer or a specific data synchronization mechanism.  The connections between web servers and their respective caches are shown in green and blue, while the connections to the NoSQL database are shown in purple.  The load balancer uses geo-routing to direct requests to the closest data center, optimizing latency." /></p>
<h2 id="heading-message-queues">Message Queues</h2>
<p>A <strong>Message Queue</strong> is a system that stores messages and lets services communicate <strong>asynchronously</strong>. It helps <strong>decouple producers and consumers</strong>, making applications more <strong>scalable and reliable</strong>, since messages can still be processed even if one side is temporarily unavailable. Producers publish messages to the queue, and consumers pick them up whenever they’re ready to process them.</p>
<p><img src="https://bytebytego.com/images/courses/system-design-interview/scale-from-zero-to-millions-of-users/figure-1-17-J2NLNRNY.svg" alt="Image represents a producer-consumer architecture using a message queue.  A rectangular box labeled 'Producer' is connected via a solid arrow labeled 'publish' to a hexagonal box representing a 'Message Queue'.  Inside the message queue are three envelope icons, symbolizing messages.  The message queue is connected to a rectangular box labeled 'Consumer' via two arrows. A solid arrow labeled 'consume' indicates the flow of messages from the queue to the consumer. A dashed arrow labeled 'subscribe' points from the consumer back to the message queue, illustrating the consumer's subscription to the queue for receiving messages.  Below the diagram, the text 'Viewer MessageQueue.svg 1.1.1' indicates the diagram's source and version." /></p>
<p>A <strong>message queue</strong> is like a waiting line where tasks (messages) are stored until someone picks them up.Ex: When we place an order on an e-commerce platform, the <strong>inventory update</strong> and <strong>report generation</strong> don’t happen instantly but are handled in background queues.</p>
<h2 id="heading-logs-metrics-automation">Logs , Metrics , Automation</h2>
<p>As our website is grown now and we need to invest in logging and metrics</p>
<ul>
<li><p><strong>Logging</strong> → Keeps a record of what’s happening in the system (errors, requests, events). Needed for debugging, audits, and finding issues fast.</p>
</li>
<li><p><strong>Metrics</strong> → Numbers that show system health (CPU, memory, response time, traffic). Needed to measure performance and know when to scale or fix something.</p>
</li>
<li><p><strong>Automation</strong> → Automatically handles deployments, scaling, monitoring, and recovery. Needed to reduce human error, speed up processes, and keep systems reliable.</p>
</li>
</ul>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-19-MOPDW7TD.png&amp;w=3840&amp;q=75" alt="Image represents a system architecture diagram for a web application.  A user, accessing via web browser (www.mysite.com) or mobile app (api.mysite.com), initiates a request that first goes through a DNS server.  The request then reaches a load balancer, distributing traffic to multiple web servers within a data center (DC1).  The web servers interact with databases and caches for data retrieval.  A green arrow shows the web servers using caches. A blue arrow shows the web servers using databases.  A purple arrow indicates that after processing, the web servers send messages to a message queue.  These messages are then processed by a set of workers, which subsequently write data to a NoSQL database.  The entire system is fronted by a CDN (Content Delivery Network) for faster content delivery.  Finally, a separate component labeled 'Tools' (2) at the bottom shows monitoring, logging, metrics, and automation functionalities, suggesting a robust operational monitoring and management system." /></p>
<h2 id="heading-database-scaling"><strong>Database scaling</strong></h2>
<p>As the data grows bigger now , it’ll get overloaded and we need some ways to fix this issue. We can implement following approaches.</p>
<h3 id="heading-vertical-scaling">Vertical scaling</h3>
<p>It means improving a single server’s capacity by adding more resources like <strong>CPU, RAM, or storage</strong> Example: Upgrading memory from <strong>8 GB to 32 GB</strong> allows the server to handle more traffic.</p>
<h3 id="heading-horizontal-scaling">Horizontal scaling</h3>
<p>It means adding more <strong>servers</strong> instead of making one server stronger. For instance, you can deploy 5 servers and use a <strong>Load Balancer</strong> to spread the traffic among them.</p>
<h3 id="heading-sharding">Sharding</h3>
<p>It is a way of splitting a large database into smaller pieces (called <em>shards</em>), where each shard holds a portion of the data.Example: Instead of one database storing data for <strong>all users</strong>, you split it so users A–M are stored in <strong>Shard 1</strong>, and users N–Z are stored in <strong>Shard 2</strong>.</p>
<p><img src="https://bytebytego.com/_next/image?url=%2Fimages%2Fcourses%2Fsystem-design-interview%2Fscale-from-zero-to-millions-of-users%2Ffigure-1-23-3IYFN6Q6.png&amp;w=3840&amp;q=75" alt="Image represents a system architecture diagram for a web application.  A user, accessing via web browser (www.mysite.com) or mobile app (api.mysite.com), initiates a request that first resolves through a DNS server. The request then goes to a CDN (Content Delivery Network) before reaching a load balancer distributing traffic across multiple web servers within a data center (DC1).  These web servers interact with a sharded database (labeled 'Databases,' numbered 1), a cache layer for improved performance, and a message queue.  Data is also written to a NoSQL database (labeled 'NoSQL,' numbered 2).  A separate set of workers processes tasks from the message queue.  Finally, a 'Tools' section at the bottom shows components for logging, metrics, monitoring, and automation, suggesting a robust system monitoring and management infrastructure.  The connections between components show the flow of requests and data, with green lines indicating data flow to the cache, purple lines indicating data flow to the NoSQL database, and blue lines representing the main request flow." /></p>
<p>After implementing these steps, our architecture can gracefully handle <strong>millions of users and beyond</strong>. But system design is never truly “finished” , it’s an <strong>iterative process</strong> where we continuously refine, <strong>decouple layers</strong>, add more <strong>caching strategies</strong>, and adjust components as the system grows.</p>
<p>Thanks for reading! 🎉 A lot of these learnings are inspired by the amazing content from ByteByteGo.If you’re serious about <strong>system design</strong>, I highly recommend checking out their course.</p>
]]></content:encoded></item><item><title><![CDATA[How GZIP Compression Made My Spring Boot App 80% Lighter]]></title><description><![CDATA[Recently, while working on my Spring Boot backend, I stumbled on one of those optimizations that gives huge results for minimal effort: GZIP compression.
No complex setups. No code changes. Just a few config tweaks and boom—API payloads dropped by 80...]]></description><link>https://blog.raghavdev.in/how-gzip-compression-works</link><guid isPermaLink="true">https://blog.raghavdev.in/how-gzip-compression-works</guid><category><![CDATA[Springboot]]></category><category><![CDATA[gzip]]></category><category><![CDATA[performance]]></category><category><![CDATA[backend]]></category><category><![CDATA[Java]]></category><category><![CDATA[APIs]]></category><category><![CDATA[optimization]]></category><dc:creator><![CDATA[Raghav  Shukla]]></dc:creator><pubDate>Sun, 18 May 2025 12:52:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/EUzk9BIEq6M/upload/9dc4a0d2d5480e876b41a6e5c65a2673.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Recently, while working on my Spring Boot backend, I stumbled on one of those optimizations that gives <strong>huge results for minimal effort</strong>: <strong>GZIP compression</strong>.</p>
<p>No complex setups. No code changes. Just a few config tweaks and boom—<strong>API payloads dropped by 80%</strong> in size.<br />Let me walk you through what GZIP is, how I enabled it, and the massive impact it had on my app’s performance.</p>
<hr />
<h2 id="heading-what-even-is-gzip">🤔 What Even <em>Is</em> GZIP?</h2>
<p>Think of GZIP like a vacuum pack for your data—shrinks it down, zips it across the wire, and the browser puffs it back up instantly.</p>
<p>GZIP is a lossless compression algorithm that reduces the size of your HTTP responses before they're sent to the client.</p>
<p>It's like zipping a file before sending it. The browser automatically unzips it, so the user receives the same data, just more quickly.</p>
<h3 id="heading-why-gzip-rocks">Why GZIP Rocks:</h3>
<ul>
<li><p>📉 Smaller payloads = faster APIs</p>
</li>
<li><p>📱 Better experience on slow networks</p>
</li>
<li><p>📈 Boosts SEO (Google <em>loves</em> fast apps)</p>
</li>
<li><p>💰 Reduces bandwidth costs</p>
</li>
</ul>
<hr />
<h2 id="heading-setting-up-gzip-in-spring-boot">⚙️ Setting Up GZIP in Spring Boot</h2>
<p>You don’t need a library or dependency—just flip a few switches in <code>application.properties</code>:</p>
<pre><code class="lang-markdown"><span class="hljs-section"># Enable compression</span>
server.compression.enabled=true

<span class="hljs-section"># Only compress responses larger than 1KB</span>
server.compression.min-response-size=1024

<span class="hljs-section"># Compress these content types</span>
server.compression.mime-types=application/json,application/xml,text/html,text/xml,text/plain
</code></pre>
<p>Here’s what each line does:</p>
<ul>
<li><p><code>enabled=true</code>: Turns on compression</p>
</li>
<li><p><code>mime-types</code>: Targets responses like JSON, HTML, text</p>
</li>
<li><p><code>min-response-size=1024</code>: Compress only if it’s bigger than 1KB</p>
</li>
</ul>
<p>That’s it. Restart your app, and you’re good to go.</p>
<hr />
<h2 id="heading-testing-it-out">🔍 Testing It Out</h2>
<p>To confirm it’s working:</p>
<ul>
<li><p><strong>Postman</strong>: Look for <code>Content-Encoding: gzip</code> in the response headers.</p>
</li>
<li><p><strong>Chrome DevTools</strong> → Network tab: Same thing, check the response headers.</p>
</li>
<li><p>Or use <code>curl</code>:</p>
</li>
</ul>
<pre><code class="lang-bash">curl -H <span class="hljs-string">"Accept-Encoding: gzip"</span> -I http://localhost:8080/api/your-endpoint
</code></pre>
<hr />
<h2 id="heading-before-vs-after-the-real-impact">📊 Before vs After: The Real Impact</h2>
<p>I tested few of my endpoints returning a big JSON array. Here’s what changed:</p>
<h3 id="heading-before-gzip">Before Gzip</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747571043156/36fc0159-443c-4f44-80c0-21f08855d9dd.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-after-gzip">After Gzip</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747571061378/72d6b73d-7744-406d-8ba3-30f674750a3a.jpeg" alt class="image--center mx-auto" /></p>
<p>Using GZIP compression reduces the time it takes to send data over the network because it makes the data smaller. This smaller size means it travels faster, improving load times. However, because GZIP is a lossless compression method, the original data size remains unchanged once decompressed, ensuring that no data is lost during the process. This allows for faster data transfer without sacrificing data integrity.</p>
<hr />
<h2 id="heading-few-tips">💡 Few Tips</h2>
<ul>
<li><p>🔁 Pair with caching (GZIP + Redis = ultra-fast)</p>
</li>
<li><p>📶 Test on slow networks (e.g., Chrome’s 3G throttle)</p>
</li>
<li><p>🖥️ Watch CPU usage—compression takes some cycles</p>
</li>
</ul>
<hr />
<h2 id="heading-references">🔗 References</h2>
<ul>
<li><p><a target="_blank" href="https://docs.spring.io/spring-boot/how-to/webserver.html#howto.webserver.enable-response-compression">Spring Boot Compression Docs</a></p>
</li>
<li><p><a target="_blank" href="https://gzip.org/">GZIP Website</a></p>
</li>
</ul>
<hr />
<h2 id="heading-wrapping-it-up">🎯 Wrapping It Up</h2>
<p>GZIP gave my Spring Boot app a major speed boost for practically no effort. It’s one of those tiny tweaks that should be in every dev’s toolbox.<br />Got an API? Turn on GZIP. You’ll thank yourself later. 😄</p>
<p>🗣️ <strong>Got any other backend speed hacks? I’d love to hear them—drop them in the comments or hit me up!</strong></p>
]]></content:encoded></item><item><title><![CDATA[Reddit Backend Project Using Microservice Architecture]]></title><description><![CDATA[What are Microservices
Microservices represent a software architecture approach where a complex application is built as a collection of small, independent services, each running in its process and communicating with others through well-defined APIs. ...]]></description><link>https://blog.raghavdev.in/reddit-backend-project-using-microservice-architecture</link><guid isPermaLink="true">https://blog.raghavdev.in/reddit-backend-project-using-microservice-architecture</guid><category><![CDATA[reddit]]></category><category><![CDATA[Springboot]]></category><category><![CDATA[Java]]></category><category><![CDATA[MongoDB]]></category><category><![CDATA[Spring]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[API Gateway]]></category><category><![CDATA[Spring Cloud Config Server ]]></category><category><![CDATA[java8]]></category><dc:creator><![CDATA[Raghav  Shukla]]></dc:creator><pubDate>Thu, 17 Oct 2024 18:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/0FytazjHhxs/upload/c0f01b0b9dfdf7a19ac216c7dc263a90.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-what-are-microservices">What are Microservices</h2>
<p>Microservices represent a software architecture approach where a complex application is built as a collection of small, independent services, each running in its process and communicating with others through well-defined APIs. The key principle is to decouple various functionalities into modular services, promoting flexibility, scalability, and ease of maintenance. Each microservice is responsible for a specific business capability and can be developed, deployed, and scaled independently.</p>
<h3 id="heading-advantages">Advantages</h3>
<ul>
<li><p>Enhanced modularity: Microservices break down applications into smaller, independent services.</p>
</li>
<li><p>Scalability: Each microservice can be scaled independently to meet specific demands.</p>
</li>
<li><p>Parallel development: Developers can work on individual components without affecting the entire system.</p>
</li>
<li><p>Flexibility: Easy integration of new features and updates without disrupting the entire application.</p>
</li>
<li><p>Resource optimization: Efficient utilization of resources by scaling specific microservices as needed.</p>
</li>
</ul>
<h3 id="heading-overview">Overview</h3>
<h2 id="heading-project-architecture">Project Architecture</h2>
<p>The project consists of 6 microservices. Each service is independent and loosely coupled so it doesn't affect other services.</p>
<p>Java version: 17, Dependency Management: Maven, Springboot Version: 3.2+</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707633304548/5542a916-77cb-4890-8701-4d69a54bc6f0.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-spring-cloud-config">Spring Cloud Config</h3>
<p>Spring Cloud Config is a tool in the Spring Cloud ecosystem that centralizes and manages application configuration settings. Instead of hardcoding configurations in every service we can fetch all the configurations from a central repository which eases the management and updating of config.<br />It also helps in having different configurations for different environments by creating multiple YML files for each environment. It also facilitates dynamic updates without the need for service redeployment.</p>
<p>I have used the following dependency to utilize the Spring Cloud config</p>
<pre><code class="lang-svelte"><span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.springframework.cloud<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spring-cloud-dependencies<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>$</span><span class="javascript">{spring-cloud.version}</span><span class="xml"><span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span></span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707685598194/ab3ed630-9719-4918-9486-a2ed4d73b36a.png" alt="Snapshot of Config File" class="image--center mx-auto" /></p>
<h3 id="heading-discovery-service">Discovery Service</h3>
<p>Using Eureka Discovery service here. helps microservices find and communicate with each other in a distributed system.each service registers itself with Eureka, and other services can look up its location when they need to interact with it.</p>
<p>Automatically locate and connect to services without hardcoding their locations, making the system more flexible and scalable.</p>
<p>LB:Distribute incoming requests among multiple instances of a service to improve performance and reliability.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707579996242/8a30a566-6d9b-4fda-90fa-e62ddfb85caa.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-api-gateway-service">API Gateway Service</h3>
<p>An API gateway acts as a central hub for managing and securing interactions between clients and backend services. It streamlines tasks like authentication, authorization, and routing, enhancing security, performance, and scalability. By providing a unified entry point, it simplifies client access and enables monitoring and analytics for better insights and decision-making. In summary, API gateways play a vital role in modern architecture by optimizing communication and abstracting complexities.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707685773378/006ad2a6-981d-4f09-a57c-a497389a5bc0.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-user-service">User Service</h3>
<p>Manages user-related operations such as user creation, retrieval, update, deletion, status change, searching, filtering, and association with subreddits, posts, and comments.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707685842156/4b9a2422-5926-482b-aff1-db9a20013548.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-subreddit-service">Subreddit Service</h3>
<p>Manages subreddits, offering functionalities for creating, retrieving, updating, and deleting subreddits. Additionally, it handles user membership management within subreddits, association of posts, and interaction with comments.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707685895542/886e1704-7dcf-47d4-be5e-2d79ce8e15d6.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-post-service">Post Service</h3>
<p>Comprising controllers for comments, posts, and votes, this service manages the core functionalities of the Reddit clone project. It facilitates the creation, retrieval, updating, and deletion of comments and posts, as well as the handling of voting interactions, ensuring a robust and engaging user experience.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1707685963215/81175e29-d08b-4336-b791-7589241b2246.png" alt class="image--center mx-auto" /></p>
<p>Check out my project on GitHub</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://github.com/Raghav-byte/Reddit-Clone-Backend">https://github.com/Raghav-byte/Reddit-Clone-Backend</a></div>
]]></content:encoded></item></channel></rss>