How would we scale this geographically. Let's say if we have 10K trucks in India and 10K trucks in US and 10K in Canada. How do we make sure that we are going to the right partition. Can we add (country, truckId, date) as the partition?
@irtizahafiz
10 ай бұрын
That can be an option, yes.
@PhaseControlDNB
Жыл бұрын
When I asked Chat GPT on what is the best way to query data across multiple partitions it told me - one multi-partition query. "In general, using a multi-partition query to retrieve data from multiple partitions in a single query can be faster than running multiple individual queries for each partition, especially when dealing with a large amount of data. Here are a few reasons why: - Reduced network overhead: When you run multiple queries, each query requires a separate round-trip to the database, which can result in significant network overhead, especially when dealing with large amounts of data. A single multi-partition query requires only one round-trip to the database, which can be more efficient. - Reduced CPU overhead: Running multiple queries can also result in increased CPU overhead on both the client and server, as each query requires separate processing and parsing. A single multi-partition query can reduce this overhead by combining multiple queries into a single request. - Improved caching: When you run multiple queries, the database may need to read the same data multiple times, which can result in reduced cache efficiency. A single multi-partition query can improve caching by allowing the database to read and cache data once and then reuse it for multiple partitions."
@irtizahafiz
10 ай бұрын
The approach is tightly dependent on both your schema and data volume. In my experience per partition has been better, but that’s not to say the other approach is wrong.
@d4lep0ro
4 ай бұрын
Very well explained.
@pattheitguy
Жыл бұрын
I can't believe you made such a killer video, and after 2.3k views, 54 likes. That makes no sense. GREAT JOB!!!
@irtizahafiz
10 ай бұрын
That means a lot! Thank you for watching.
@irtizahafiz
2 жыл бұрын
I would recommend checking out these videos about Cassandra before diving into this one: Cassandra Crash Course: kzitem.info/news/bejne/rJCpvIecioiGZ34 Cassandra Partition Keys: kzitem.info/news/bejne/tG-o05pqkWl2kno
@intrestingname7934
Жыл бұрын
Underrated content! There aren't many videos that are this clear and concise for Cassandra. I'm watching this at 2x and it's perfect lol. It's still good at 1x though when I need to focus and listen more closely.
@irtizahafiz
10 ай бұрын
Haha yeah I think I speak slow sometimes lol English is not my first language. But glad you still found it helpful!
@reshabgupta8593
2 жыл бұрын
Hi Thanks for the video. Also just wanted to know according to approach 2 if I want 1 year of data does that means I need to run 365 queries concurrently ? or is there some other approach if there is no limit on date range of queries ?
@irtizahafiz
2 жыл бұрын
In that case, you could partition by the month instead of every single day. Then you make 12 concurrent request which is better. It totally depends on how quickly you expect your partitions to grow. Can you tell me a bit about your use case, especially at what kind of frequency will you be writing your data?
@reshabgupta8593
2 жыл бұрын
@@irtizahafiz its 20-30 writes per second.
@irtizahafiz
2 жыл бұрын
If you can partition by something else, alongside month, I think it will be alright. Following the example in the video, your partition key could be something like . Depending on your application you can substitute truck_id, for user_id, business_id, sensor_id or whatever you have.
@reshabgupta8593
2 жыл бұрын
@@irtizahafiz Sure thanks, I have an id to partition along with month. Also what is the upper limit of partition size, as I read it should be around 100 MB, so if my months data crosses this value then I need to do for 10 days I believe. Also thanks for your reply and videos.
@irtizahafiz
2 жыл бұрын
100MB is a “recommended” upper limit, not a strict one. Yeah you might want to do some estimations based on your write throughput and how wide you expect your table to be. No problem. I am glad you are getting some value out of them. If you have any suggestions to make them better, please let me know.
@einfacherkerl3279
2 жыл бұрын
what if I want to store data tracking a users activities on different "objects". i do know that there would be tables that keep log by object and log by user. what i don't understand is that instead of using object id and userid as partition key, you suggested to use date as partition key. does it also apply in my case?
@irtizahafiz
2 жыл бұрын
So it totally depends on your access pattern. I would recommend adding date as part of a partition key if you expect each user_id and object_id to have lots of activities over a long period of time. If that’s the case, and you don’t add date as a partition key, your partitions will grow unboundedly and queries will start suffering after a period of time. On the other hand, if you add date as a part of the partition key, you will have bounded partitions. But of course there’s a tradeoff, cause now you will have to make several queries to get all activities of a single user or object. Hope that helps.
Пікірлер: 18