Data warehouse, data mining and BI (Business Intelligence) are technologies which are used to make the stored data helpful for decision making and they are explained about.
Data Warehouse
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.
Subject-Oriented – A data warehouse can be used to analyze a particular subject area. For example, “sales” can be a particular subject.
Integrated – A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.
Time-Variant – Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.
Non-volatile – Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.
ETL tools perform three functions to move data from one place to another:
- Extract data from sources such as ERP or CRM applications;
- Transform that data into a common format that fits with other data in the warehouse; and,
- Load the data into the data warehouse for analysis.
Each step in the process has its challenges like during the extract step, data may come from different source systems (e.g. Oracle, SAP, Microsoft) and different file formats such as XML, flat files with delimiters (e.g. CSV), or the worst – old legacy systems that store data in arcane formats no one else uses anymore.
Data Mining
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information – information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
Business Intelligence (BI)
Business Intelligence is collection of technologies and applications which combined together for
gathering, analyzing and storing data (or information) which can be fruitful for an organization, and helps the organization employees to make better, smart and profitable business decisions.
Various components of business intelligence configuration are
- Source systems which gather the information that will be analyzed.
- Information from the source systems are stored in their native databases or other data sources. Typically the database is configured for speed of processing.
- Information from the data sources goes through a process known as ETL where the data is extracted from the source system, transformed (to meet business needs) and loaded into a data warehouse. Many different data sources can be consolidated.
- Information from the data warehouse is made available to end-users in the form of data marts where the data is organized to answer specific types of business questions (e.g., sales data can be cross referenced by product, region, time, sales representative, etc…)
- Finally, reporting and analytic tools are used to analyze the information in the data marts. This genre of reporting tools are known collectively as on-line analytical processing (OLAP for short.) There are a few different variations of OLAP tools: MOLAP or multi-dimensional OLAP and relational OLAP or ROLAP. Each has its advantages and disadvantages.