Thanks for the comment! I only want to do it insofar as it is useful to people :)
@aforty1
11 күн бұрын
Thanks for these videos!
@Henry-wq7ki
Ай бұрын
Great job as always
@idiot7leon
Ай бұрын
Brief Outline 00:00:22 Amazon 00:00:57 Problem Requirements 00:02:02 Capacity Estimates 00:03:37 High Level Overview 00:04:23 Product DB 00:06:24 Building a Cart 00:08:10 Building a Cart Continued 00:09:36 Avoiding Contention 00:12:16 Observe Remove Set 00:15:22 Orders Service 00:16:58 Avoiding Contention on Orders 00:19:42 Order Processing With Streams 00:22:43 Optimizing Reads - Popular Items 00:25:56 Optimizing Reads - Search Indices 00:28:22 Search Index Local Indexing 00:30:48 Populating Caches and Indices 00:32:40 Final Diagram - Amazon Thanks, Jordan~
@hristianiliev7979
Ай бұрын
Awesome!
@shibhamalik1274
Ай бұрын
Hi @jordan. Awesome video. One question on using leaderless replication for cart service. Will the CRDT add read latency to the cart service since it is doing some aggregation stuff here. What is CRDT here anyway i mean is it just a method of aggregation Or a predefined solution for such problems . how much added latency this would be ? I understand your video is assuming we are pushing limits on the number of concurrent events on the cart but i think think might not be what an interviewer would agree with and can lead to noise while giving the interview. what do u think ?
@jordanhasnolife5163
Ай бұрын
Nope! Reading from a CRDT is just from one node. The anti entropy process between the multiple leaders is when the merging gets done.
@shibhamalik1274
Ай бұрын
Hi @jordan do you have a video on how to configure a spark cluster . Things like partition size , number of cores/partitions( same I blv), number of executors , number of cores per executors , executor memory etc
@jordanhasnolife5163
Ай бұрын
I do not - I don't think this is overly relevant for the systems design interview though
@AjItHKuMaR-ns5yu
Ай бұрын
Hi Jordan. I have one question regarding the cart using multi leader setup. I do agree this helps solve contention and eventually we see the correct items in the cart. However, isn't cart supposed to be strongly consistent?. Because, suppose user A adds couple of items on one node1 and user B adds some other item on Node2. What happens when user A clicks checkout even before the 2 nodes exchange data and see the final state of cart???
@jordanhasnolife5163
Ай бұрын
I was mostly just using this as an example to stretch the boundaries here. It doesn't *have* to be strongly consistent, but I generally agree that it should. With that being said, I think using a single leader to perform writes is totally fine.
@SunilKumar-qh2ln
Ай бұрын
for popular items read part, can we rely on LFU caching strategies for productid key everywhere where reads are happening, like product features display, similar items related to a product display etc
@jordanhasnolife5163
Ай бұрын
I don't see why not - any particular reason to opt for LFU over LRU?
@SunilKumar-qh2ln
Ай бұрын
@@jordanhasnolife5163 My thought process was that, lets say based on some capacity estimations, we can store info in cache for 10k popular items only, so limit the cache with ~10k keys and any key which is more frequent (i.e. more popular) will stay and less frequent will be dropped. I get that with this we will not have the popularity score which is being used at later point during search index.
@shibhamalik1274
Ай бұрын
Hi Jordan what is your recommendation to synchronize the product queues with the Inventory changes queue. What I mean is that how should we deal with the situation of lag on the inventory changes queue and we dont want orders to get rejected bcs of that lag. How much do u think should be our waiting period if add hold and is it ok for customer to get emails x minutes later or what is ur opinion on this ?
@jordanhasnolife5163
Ай бұрын
Not entirely sure what you mean here, feel free to clarify. The second a customer order comes into flink, we check whether we have inventory for it. If we don't we'll reject them right away. In reality, kafka should not have a significant lag period, but if for whatever reason it went down that's why we have replicas! The bigger thing is to just make sure to update the product db when we see that we don't have stock left so that people can stop placing orders.
@dinar.mingaliev
Ай бұрын
Thanks Jordan for your work! I hope Megan keeps posting you messages :) and meanwhile I have a question about CRDT: what if client one adds 12 products A, client two add 3 products A and client three deletes for example 7. Looks like CRDT are not going to manage such situations?
@jordanhasnolife5163
Ай бұрын
Thanks! I actually ghosted Megan fox for Corinna kopf. Anyways, it depends whether the person deleting sees all of the tags for product A at the time of deletion. In your case, it sounds like they probably wouldn't, so product A would still be in the cart. That being said, assuming each client writes to the same leader, client 1 can't just add A to the set 12 times, it should already be present.
@Omran95
Ай бұрын
Thank you, very nice and clear explanation, I am just wondering which data store are we going to use to store the orders?
@jordanhasnolife5163
Ай бұрын
Seems pretty reasonable to me to just use SQL
@jianchengli8517
19 күн бұрын
For the order handling part, if I have multiple items in a single order, how do the flink nodes coordinate with each other to figure out if the whole order can be fulfilled? What happen if one of the items are out of stock? Are we sending out email once per item in the order?
@jordanhasnolife5163
19 күн бұрын
Unfortunately, we give up atomicity here. The only way to do this would actually be to have a second queue that takes all of the events, once again grouped by order id, and combining them (almost like my design for youtube videos). In my current design though, I'm okay with partial orders.
@managerbmr
Ай бұрын
Where the F did Elastic search come from? I didn't hear it mentioned until the last slide, also, what happened to the spark cosumer for counting orders/clicks in the last slide?
@jordanhasnolife5163
Ай бұрын
I talk about inverted elastic search pretty frequently, so I feel pretty justified in saying that at this point not spending 10 minutes a video talking about it is a good thing. That being said, I've brought it up in a variety of these other videos!
@hukuna9957
Ай бұрын
I’m nodding like I understand, but I’m not so sure I do…
@jordanhasnolife5163
Ай бұрын
I'd recommend asking a specific question in that case
@shibhamalik1274
Ай бұрын
Hi Jordan while implementing can we not just use an epoch timestamp for events and then decide the eventual cart picture ? Thanks for all ur videos, they are super cool 😊
@jordanhasnolife5163
Ай бұрын
Yep! Keep in mind that epoch timestamps across distributed nodes are not perfect though due to clock drift.
@Unleash132
25 күн бұрын
If you split orders now you have to handle new problems like one of the products is unavailable so you have to undo all the other products amount decreases. Of course you would have to undo the order anyway if payment fails for example so it might not be a big issue. Another problem is if you decreased an amount of product for order A then you found out another product is missing so you undo it, meanwhile other orders that could've passed failed. Isn't it simpler just keeping the orders as is not splitting them and using atomic operations to inc/dec? for example in mongodb you have atomic operations and you can decrement the amount you need atomically while providing a filter with amount > 0 for each item, this way if you failed your update operation you know at least one of the items in the order is out of stock therefore the order can not be processed.
@jordanhasnolife5163
24 күн бұрын
It's absolutely simpler - it just becomes a question of whether it scales! If you use atomics for all order operations you risk a lot of contention due to grabbing locks there. It's probably fine IRL, but for the sake of taking everything to the max for these videos, I've chosen to discuss the tradeoffs of having to revert pieces of orders after the fact. Presumably, you wouldn't cancel the entire order, but just the part that was out of stock.
@shibhamalik1274
Ай бұрын
Hi Jordan How do we handle quantities while implementing CRDT for cart service ? Since the delete or remove 1 quantity would be same operation but that doesn’t mean that product is removed from cart. That product could have multiple quantities …
@jordanhasnolife5163
Ай бұрын
When I remove my product from the cart with the CRDT approach, I will create a remove operation for all of the tags on a given product that I currently see. If there happen to be other tags that I don't currently see, then those additions of the product will not be removed.
@martinwindsor4424
12 күн бұрын
@@jordanhasnolife5163 hey jordan do you mean, multiple quantities of same item can have same tag when you add to crdt? When you want to remove you remove all the instances of the add operation with that tag? In that case, how would our cart DB look like? would it have different ids for each add operation but same tag?
@jordanhasnolife5163
12 күн бұрын
@@martinwindsor4424 Each item that we add (every individual 1 lot quantity) can have a tag, or you can make it so that as an optimization when you add say 5 at a time the CRDT says ("eggs", 5, "kshfskjfdh"). Either will work. When someone removes, they read from their database all mentions of that product from their cart, and remove them all, including the associated tag.
@tunepa4418
Ай бұрын
For the cart service, since it’s not really a critical service, can we merge all the items in all the leaders and return that to the user when there’s a conflict and allow the user to resolve the conflict. I think Dynamo does this
@tanaygupta632
Ай бұрын
cart service is not critical service how ?
@tunepa4418
Ай бұрын
@@tanaygupta632 it’s not so critical in the sense that having a conflict there will be the end of the world
@jordanhasnolife5163
Ай бұрын
Yes you could store siblings and have the user merge them if we couldn't do the merging logic in the database. Keep in mind that this is still eventually consistent though, there will be a period where each leader is only aware of its own write and not the impending merge conflict.
@bogdax
Ай бұрын
Thanks for the video. Could you do a video on Leetcode? WIth focus on the online contest and its leaderboard features.
@jordanhasnolife5163
Ай бұрын
Interesting. To me this seems like a combo of the job scheduler (for running the code in the cloud) as well as the leaderboard problem, which I'm doing next. Do you think that there are unique aspects to the problem outside of those?
@bogdax
Ай бұрын
@@jordanhasnolife5163 Thank you for doing the leaderboard! Other unique aspects could be plagiarism detection and handling time limits fairly. That time limit issue might get complex with different languages and hardware, but IDK, the ICPC has the same execution time limits no matter which language you use to solve their problems. The reason I'm asking for a video on this problem, is because Meta is currently asking it:)
@managerbmr
Ай бұрын
What's the difference between the Spark and the Flink processing? are they going to be 2 different consumers (flink and spark) or is the spark replacing the flink in the inventory quantity processing?
@jordanhasnolife5163
Ай бұрын
Hey yeah, in the final diagram I'm using fully flink, however generally speaking flink and spark streaming are basically the same, except spark streaming processes things in minibatches
@managerbmr
Ай бұрын
@@jordanhasnolife5163 thank you for clarifying, I just didn't know where did the spark go. Do you see any advantages in terms of infra or maintenance between spark and flink?
@jordanhasnolife5163
Ай бұрын
@@managerbmr none specifically that I've heard of in my own research!
@cromagnon0101
Ай бұрын
Hi Jordan, Thanks for the video! do you mind sharing your notes please? if that's feasible?
@jordanhasnolife5163
Ай бұрын
Hey! Will do this eventually, though probably in batch in a couple of months once this current series comes to a conclusion
@AjItHKuMaR-ns5yu
Ай бұрын
Hi Jordan. Thanks for clarifying my previous question. I have another question here. For the order service, I see that we avoid querying the products DB to know how many items of a product are available and we rely on the data present in Flink only. I have a couple of question with this approach. 1. Products Db is huge, and hence we have partitioned. Can Flink hold the entire data in flink?? 2. Lets say we partition Flink as well to accomodate huge amount of data. What happens if Flink goes down? Will other instance of Flink continue serving requests?? if so, when the 1st Flink node comes back up, it relys on the local state, while there might have been changes to the order count in Db. how do we ensure the data in flink is always consistent with the DB? 3. If we have a solution for 1 and 2. Do we even need a DB?? isnt flink taking care of everything? I am sorry if I was not able to articulate my question clearly.
@jordanhasnolife5163
Ай бұрын
1) Flink is also partitioned, we have one flink consumer per kafka partition 2) Either another instance of flink, or one of the other partitions. See Flink checkpointing. All of the DB changes are still in kafka, flink can play them back. (I have a video dedicated to flink, multiple in fact) 3) Our DB is still our source of truth for product data, we just want inventory count in flink.
@managerbmr
Ай бұрын
Is there a payment in this process? how is that handled in this design?
@jordanhasnolife5163
Ай бұрын
I tend to not touch on payment systems too much as I suppose that's a video of it's own, but I'd imagine you'd just use something like stripe and reach out to a bunch of third party apis to handle this for you.
@sandeepreddy6295
Ай бұрын
Hey Jordan, could you also do a video on Airbnb, please. Your videos are amazing.
@jordanhasnolife5163
Ай бұрын
How do you find that Airbnb differs from Yelp in the core design
@sandeepreddy6295
Ай бұрын
I haven't yet seen the Yelp's video. If it is more or less the same then a video on Airbnb won't be needed. Thank you!
@i-am-the-slime
Ай бұрын
What does Flink actually do here? Why aren't the downstream things just connected to the Kafka topics?
@jordanhasnolife5163
Ай бұрын
Flink is good for performing aggregation, and we don't have to worry about fault tolerance very much given the way that it checkpoints state. We can use a normal server and have it listen to kafka, but if it goes down the state that we've kept in memory gets lost forever.
@i-am-the-slime
Ай бұрын
Thanks for not having a life and answering so quickly. I also watched some other videos and it's interesting because Flink seems different to most of these infrastructure projects since it's concerned more with processing the data than storing it. That's what was confusing me.
@yrfvnihfcvhjikfjn
Ай бұрын
I've been going through designing data intensive applications and using it to fill in the holes in the notes I took from your videos. It follows the same content in the same order 😅
@jordanhasnolife5163
Ай бұрын
Oh yeah I shamelessly ripped it off for the first 20 concepts videos no doubt about it
@andymaheshw
Ай бұрын
Can you do a video on figma?
@jordanhasnolife5163
Ай бұрын
Eventually, sure - I have to do some more research into how it actually works I'd say
@SicknessesPVP
Ай бұрын
Why do you want to buy tissues?
@jordanhasnolife5163
Ай бұрын
For my runny nose of course
@Rohit-hs8wp
22 күн бұрын
Please clarify my doubts So, your Kafka solution to avoid contention related to add/removal of product concurrently to a cart, In order to ensure correctness kafka has to be partitioned based on hash of cart_id and also ensure exactly once processing. If we allow atleast once processing then the cart service have to be idempotent in nature. Suppose we design that idempotent cart service then we have to keep transaction_id and the schema will look something like this CartTable(cart_id, transaction_id, product_id, user_updation_id, timestamp) This structure of table doesn't quickly tell whether a product is in the cart or not and if it is in the cart how many quantity of that product is in it? we have to write some complex query like this to get the quantity and whether it is in the cart or not SELECT cart_id, product_id, SUM(CASE WHEN isRemoved = false THEN 1 ELSE 0 END) AS quantity FROM Cart_table GROUP BY cart_id, product_id; May be I can use 2 index one on (cart_id, transaction_id) for idempotency and another one (cart_id, product_id) for query part OR May be I can construct another table Cart(cart_id, product_id, quantity) from this CartTable(cart_id, transaction_id, product_id, user_updation_id, timestamp) table by using stream processing and tumbling window thing for speeding the query part. Ofcourse you propose a better design but I want to know If you went with this approach how did you solve?
@jordanhasnolife5163
21 күн бұрын
Realistically, I think you're overthinking this one a bit. For a given user, we'll have like max 20 items in the cart. Just index on cart ID and product ID, and run the exact query that you mentioned, you can use this for idempotency too. It'll be fast enough due to the index
Пікірлер: 66