ETL is the process of through which any data can be inserted into database or more specifically into data warehouse. As from the name, it is clear that it involves three processes to enter data into data warehouse. Let’s discuss these steps now.
- Extract: It is the process of extracting data from external source. The external source could be as simple as an application interface to collect data or could be as complex as another data warehouse. In this step we collect the information for external source which could be useful to us as well as we may be interested in saving this information for future use. But at this point, the information is raw and could make our data warehouse unstable or unnecessary for our use.
- Transform: It is the phase where we convert the raw information to our need. We define the business logic here and transform the information extracted from external source so that we can use it. In this phase we implement the business logic to calculate the exact information we need for our data warehouse and which is useful to us. In short we modify or filter the extracted information to our need.
Load: This is the final phase where we take the transformed information from external source and insert into our data warehouse. In simple word, in this phase we load the information to end target.