Even though your lecture is not exactly what I need - I listened you with pleasure and learnt from you a couple of things. Thank you!
@LeapFrogBI
7 жыл бұрын
Thank you, M B. That is awesome to hear.
@kehtahaipalpal
7 жыл бұрын
You are the best. I knew about this structure of data warehousing but never understood why do we take such an approach. Thanks for clearing it out.
@LeapFrogBI
7 жыл бұрын
Wow! Thanks Rohan. That is quite a compliment. Thank you for taking the time to leave feedback.
@grantongstad9593
7 жыл бұрын
these are the best DW videos on youtube, thank you so much
@LeapFrogBI
7 жыл бұрын
Thank you, Grant. I do appreciate the feedback very much. Stay tuned for more content in the works.
@markteague8889
5 жыл бұрын
I worked for an entire year on a data warehouse / BI team where the manager thought the best way to achieve the organizations goals was to utilize 3 SCRUM teams competing with each other. I guess the idea was that the competition between the teams would fuel innovation. However, in less than a year we were in a situation where the team that managed the OLTP data store was screaming because the competing teams (in their selfish interests to win said manager's approval / favor) had created multiple redundant versions of the same ETL processes to build the same dimensions. Since those processes ran during off business hours here in the USA, it was impacting the ability of the organization's users in the Asia-Pacific Region to interact with the transnational data. The situations finally became critical and we spent an entire SCRUM sprint / iteration identifying and eliminating the duplicate ETL processes. In that one sprint we managed to cut the load on the OLTP data store in half.
@paulbfelix
5 жыл бұрын
Hi Mark. This situation is quite unfortunate, but it sounds like it was taken care of. No matter how diligent we are on the tech side, organizational issues can surely cause havoc. Hopefully others read your short story as well, and maybe avoid this situation. Thanks for sharing!
@markteague8889
5 жыл бұрын
@@paulbfelix The particularly tragic part of this story is that the wisdom of more experienced team members was ignored by a junior manager who had been granted inordinate authority over the process and demonstrated incredible hubris along with his blind faith in the promise of agile methods to produce a better result. In the end, many more knowledgeable members of this team simply retired early or left the organization over their frustration.
@paulbfelix
5 жыл бұрын
@@@markteague8889 Yikes! Clearly there is no way for Agile or any other methodology to solve an organizational problem that has caused groups to work in silos. Hopefully the hard lesson was learned and the organization is on the right track.
@Mukhaiyam
10 жыл бұрын
Great work, I've been waiting for this so long. Do you know when this series is going to finish? Cheers,
@RIDDLE0MASTER
7 жыл бұрын
Paul, your videos are awesome! Thank you very much!
@LeapFrogBI
7 жыл бұрын
Thank so much, Riddle0Master. More is on the way so stay tuned!
@joshuac.abella8250
3 жыл бұрын
This is helpful thank you
@paulbfelix
3 жыл бұрын
Thank you for the feedback.
@easydoggy
7 жыл бұрын
Great video - thanks for your time and help to the community
@LeapFrogBI
7 жыл бұрын
You are very welcome. Thank you for taking the time to comment.
@markteague8889
5 жыл бұрын
On another note, it is sometimes more efficient to utilize the system of record to perform the transform phase. I was once tasked with building a currency dimension and associated fact table of exchange rates. This sounds like it should be a straight-forward process, but it is complicated by the fact that exchange rates are changing in real-time. You have to decide what magnitude of change in exchange rates you care enough about to create new fact table records. In this instance, the system of record was based on Oracle (whose PL/SQL variant of SQL possessed so-called LEAD/LAG functions) and no corresponding functionality existed in Microsoft SQL Server. I believe SQL Server has incorporated such functions in T-SQL since that time. I just mention this as an example of an instance when following the normal rules (well-prescribed) might have resulted in a more complex solution.
@paulbfelix
5 жыл бұрын
Totally agree, Mark. As a rule of thumb we try to perform transforms downstream of consolidation (psa), but every situation should be analyzed closely. The cost of transforming downstream as opposed to doing so in the source during initial data collection point must be weighed with all things considered.
@badrisivaprasad619
9 жыл бұрын
Great Sir, short and sweet
@briandenzer
8 жыл бұрын
This is excellent! Thank you!
@garrym1916
8 жыл бұрын
Hey Paul, do you guys have much to do with Biml?
@LeapFrogBI
8 жыл бұрын
+Gareth MacKinnon No, LeapFrogBI does not use BIML. BIML seems like a great product though. If I was going to build an SSIS package creation product today, I would consider using it.
@strayhandycat2753
7 жыл бұрын
Hello Paul, would you please tell me where my understanding of the second normal form is incorrect(I assume its incorrect, since I am very fresh to the field and the concept). Suppose we have a table with following attributes (UserID, First_name, Last_name, Address, City, State, Zip and SSN(social security)). The table is not in 2NF, since we have unique values (UserID and SSN) that can describe every record. So to satisfy requirements of 2NF the table needs to be split (not sure if its a proper term) into 2 tables (UserID, First_name, Last_name, Address, City, State, Zip) and (UserID, SSN). I have a strong feeling, that my example may sound like a bunch of **** (lets call it "nonsense", to remain the appropriate tone of the conversation). May I ask you to provide a simple example of a table in 1NF but not in 2NF and how we transform it into 2NF.
@LeapFrogBI
7 жыл бұрын
When using dimensional modeling we really don't worry much about Codd's rules. However, I believe your table would meet 2NF rules if you split it into 3 tables. Table 1- UserID, SSN, First_Name, Last_Name, Address. Table 2 - UserID, LocationID. Table 3 - LocationID (surrogate), City, State, Zip. This is 1NF as well as 2NF requires that a table meets 1NF rules. You could likely omit table 2, but if a customer moves to a new address this gives you the option to retain this history which is important if you want to know the customer location at a particular point in time such as when a sale occurred.
@strayhandycat2753
7 жыл бұрын
Thank you! Its much clearer to me now!
@strayhandycat2753
7 жыл бұрын
When would we need to denormalize tables for data warehousing purposes? Please share an example.
@LeapFrogBI
7 жыл бұрын
If you SOR stored customer date in 3 tables (name, address, demographics), but you want a single customer dimension you will denornmalize.
@strayhandycat2753
7 жыл бұрын
Hello! Would you please tell me, why we do not consider the step between SOR and PSA as a "transformation" step? I concluded from the video(after watching it for at least few times), that the data from SOR is going to be selected by a certain criteria, thus modified (or transformed). Please clarify this step for me. I really need to understand it by Thursday. Thanking you in advance!
@LeapFrogBI
7 жыл бұрын
Hi Maxim, Thanks for watching. You can consider the stage and PSA steps to be a transformation of sorts. You are correct that you have the option at this point to select a subset of the source data. You may also apply logic to only collect records that have been modified since the prior data collection; delta load. Data values, however, are not typically transformed at the stage or PSA step. In other words, data that is chosen to be collected from the source system will not be manipulated until after it is loaded into PSA. Make sense? To complicate matters a bit it is also important to consider where the data is being collected from. If you are collecting data from a SOR view or you are using a SQL Statement to collect data, then it is possible that transformations have already been applied prior to the stage processes. Personally, I consider this to be SOR transformations, but you could argue that a SQL statement that manipulates data that is collected upstream of staging is also a transformation. I hope this helps.
@strayhandycat2753
7 жыл бұрын
Mr.Paul, thank you so much for such a detailed answer. It is becoming clearer to me. Few more questions. 1)In a simple way, may I ask you, to share examples of "transformation" , In other words, what is input, what is the output? 2) What are the most popular ETL tools? Can we consider "Microsoft SQL Management Studio" an ETL tool? 3)I remember, I read somewhere, that data in warehouses should be in the second normal form, thus, if data in SOR in 3rd normal form, denormalization takes place. Would you say it is a correct statement? If yes, can you say why? Also, can you give an example of a table, that is in 1NF form, but not in 2NF. I understand 1NF and 3NF, but 2 NF is a bit confusing.(Information on youtube, and web in general is often very vague..). Thank you so much!
@strayhandycat2753
7 жыл бұрын
Also, can we say, that a monthly , lets say sales, report, that is ECTRACTED from the DB, then TRANSFORMED into pdf, end LOADED on the site(or emailed to business users) is a proper example of ETL? Just trying to prepare for the question: give me an example of etl?
@strayhandycat2753
7 жыл бұрын
Is the following scenario is an example of etl? Suppose a web store, if on checkout user uploads a CSV file with recipients(extraction), then the system sends the file to a third party service to perform address validation, and returns the user another CSV file with 2 new columns (address_valid(T or F) and (error_description)) (transformation). Then the file is sent back to the user(Load)?
@PaulBFelixAdventures
7 жыл бұрын
1) Transformation= Convert date to year, Input = 2017-01-01, Output = 2017 2) SSMS is not typically considered an ETL tool. In the Microsoft stack, SSIS is the ETL tool. There are many others; Informatica, Data Stage, Snaplogic, and many more. 3) No. I do not agree that a data warehouse should be 2NF as a rule although this is true that a data warehouse will meet 2NF's definition at times. Denormalization does nearly always take place when building a data warehouse. OLTP systems are highly normalized because they are built to manage relatively small transactions very quickly. Data Warehouses are designed to answer analysis questions that involve many more records that an a typical OLTP transaction. See my video called "What is a data warehouse" for a lot more reasons.
@Metamorphose555
8 жыл бұрын
Excellent, thank you
@Metamorphose555
8 жыл бұрын
How do you handle change in source schema with this pattern ?
@LeapFrogBI
8 жыл бұрын
+Metamorphose555 When the source schema changes, the impacted data flow must be updated throughout. Tools like LeapFrogBI make this process much easier, but regardless of the tool being used you will need to adjust to changing source systems.
@Metamorphose555
8 жыл бұрын
LeapFrogBI Thanks for the answer. I'm implementing this pattern on AWS cloud. Sources are Couchbase, SQLs on EC2,, RDS (with SQL), S3 and Elastic. Any recommendation for my ETL tool of choice?
@LeapFrogBI
8 жыл бұрын
+Metamorphose555 Sorry for the slow reply, Metamorphose555. I use SQL Server in AWS without any issues. SSIS gets the job done. LeapFrogBI can also be used to streamline the process (Shameless plug).
@4suc6
5 жыл бұрын
May be you should beter figure out the most important and say it in 1 minute. Very unclear why is that psa.
Пікірлер: 47