Power BI Notes

Chapter 1: Understanding Business Intelligence and Power BI

Business intelligence is all about leveraging data to make better decisions.

DAX, short for data analysis expression.

The concepts of business intelligence can be broken down into five areas:


Domain

Domain is simply the context where business intelligence is applied.


Data

Once a domain has been decided on, the next step is identifying and acquiring the data that’s pertinent to that domain. This means identifying the sources of relevant data. These sources may be internal or external to an organisation and may be structured, unstructured, or semi-structured in nature.

Internal and External Data

Structured, Unstructured, and Semi-Structured Data

Structured data is data that conforms to a rather formal specification of tables with rows and columns.

For example:

In addition, this category also includes such things as Azure table storage and relational database standards such as Open Database Connectivity and Object Linking and Embedding database, which are supported standards for accessing a wide variety of relational databases as well as other data storage systems.


Unstructured data is effectively the opposite of structured data. Unstructured data cannot be organised into simple tables with rows and columns.

It’s basically stored as binary large objects (BLOBs), online files, or files in a file system such as the new technology file system (NTFS) or the Hadoop Distributed File System.


Semi-structured data has a structure but does not conform to the formal definition of structured data, that is, tables with rows and columns. Examples include:

Semi-structured data also includes NoSQL databases, which are specifically designed to store both structured and unstructured data.

They include data stores such as:

Document Database

Generally stores data in JSON, binary JSON, BSON, or XML and includes Microsoft Azure Cosmos DB, MongoDB, Cloudant IBM, Cassandra, and MarkLogic.

Graph Databases

Represent and store data using graph concepts such as nodes, edges, and properties. Examples include Neo4j and HyperGraph DB.

Key-Value Stores

Also known as dictionaries or hash tables, treat data as records that can each have a different number and types of fields. Redis, Aerospike, AWS DynamoDB, CouchBase, DataStacks, Cassandra, and MapR technologies are examples of key-value stores.

Wide Column Stores

Wide column stores organise data into column families versus rows, like traditional relational databases. Examples include Cassandra and Edge Base.


Semi-structured data also includes data access protocols such as Open Data Protocol or Data and other Representational State Transfer (REST) application programming interfaces (APIs). These protocols provide interfaces to data sources such as Microsoft SharePoint, Microsoft Exchange, Microsoft Active Directory, and Microsoft Dynamics, social media systems such as Twitter and Facebook, as well as other online systems such as MailChimp, Salesforce, SmartSweep, Twilio, Dual Analytics, and Github. These data protocols abstract how the data is stored, whether that is a relational database, a NoSQL database, or simply a bunch of files.


Model

Model, also known as semantic model or data model, refers to the way in which one or more data sources are organised to support analysis and visualisation. Models are built by transforming and cleansing data, which helps define the types of data within those sources, as well as to categorise the data according to its specific data types.

Building a model generally involves three elements:

Transforming and Cleansing

Transforming and cleansing technologies are often referred to as Extract, Transfer, Transform, Load (ETL) tools and include products such as:

Defining and Categorising

Semantic models also formally define the types of data within each table. This data type includes:

Organising

Some sources of data have pre-built models. In other words, there are defined relationships between data entities or tables within the data source.

Data warehouses are traditionally built upon online analytical processing (OLAP) technology and include systems such as Microsoft Analysis Server, Snowflake etc.

Note that OLAP refers to a method of storing data that optimises read operations for reporting purposes. This is in contrast to online transaction processing (OLTP) systems, which store data in a manner that is optimised for write operations.