Blockchain quick facts
Blockchain technology is a fast-growing disruptive technology that enables verified data to be shared among a set of untrusted parties. Moreover, blockchain makes it possible to share ledgers of items of value and control the exchange of these items in an untrusted environment.
Today’s blockchains come in public, private, and hybrid versions that support complex software applications. Learning about blockchain technology will help you understand how people and organizations will conduct business in the near future and beyond.
The following list summarizes some of blockchain technology’s most important features:
- A blockchain is a chain of blocks, where each block stores the mathematical hash value of the previous block, connecting it to its predecessor.
- Any changes to a block invalidates that block and all subsequent blocks in the chain.
- The most common use of a blockchain is to transfer some item of value (crypto-asset) from one account to another.
- Crypto-asset owners are identified by an address, which is related to the account’s public encryption key.
- Crypto-assets exist only in a blockchain, and each one is associated with some value in the real world.
- A transaction transfers a crypto-asset from one account (owner) to another.
- All transactions are digitally signed with the crypto-asset owner’s private key.
- Cryptography makes it easy to verify a transaction’s owner (digital signature) and a block’s integrity (hash).
- Smart contracts are programs that define rules that control how data gets added to, and read from, the blockchain.
- Smart contracts must run the same way, and produce the same results, on every network node instance.
- Like data, smart contract code is stored in a blockchain block and can never be changed.
- Databases support create, read, update, and delete (CRUD) operations, whereas blockchains support only read and write operations.
- Blockchain’s add-only property keeps blocks in chronological order and makes it easy to trace a crypto-asset throughout its lifecycle (forward and backward.)
- The most popular consensus mechanism used today is Proof-of-Work (PoW), which requires enormous energy. However, other consensus algorithms, such as Proof-of-Stake (PoC), are becoming more popular which can help blockchains handle transactions faster and make the technology a better fit for more applications.
Data analytics quick facts
Data analytics is all about finding hidden nuggets of valuable information in data. If the information you’re looking for were easy to find, you wouldn’t need analytics.
The real power of data analytics is in its capability to learn from the past in ways that can help you improve the chances for success in the future. Success might be measured as increased sales, reduced costs, or having the right products in the right place at the right time.
Understanding the different analytics models and their use is key to unlocking your data’s secrets. The following list summarizes some of the most common analytics techniques and models you’ll use when analyzing blockchain data:
- Every analytics model should exist to satisfy a specific business goal.
- Before starting to select any analytics model, create a data inventory (of on-chain and off-chain data).
- Build an analytics lab that allows you to experiment in an isolated environment.
- Determine your primary goals: identification, explanation, prediction, or any combination thereof.
- Clustering models show relationships among objects (identification).
- Association models can reveal objects that frequently exist together in transactions (explanation).
- Classification models determine to which group a new object belongs (identification).
- Prediction models predict future outcomes based on historical data (prediction).
- Object characteristics, or attributes, are often called features.
- A model’s output quality depends on selecting the best features to analyze.
- Scatterplots can help determine which feature sets affect outcomes.
- K-means is a popular clustering algorithm that reveals relationships between objects by identifying object clusters.
- Apriori is a useful algorithm for showing objects that occur together in transactions frequently (market basket analysis).
- Decision tree and naïve Bayes are both classification algorithms that help determine ways to label objects based on a limited number of labels.
- Regression algorithms (linear regression for continuous data and logistic regression for categorical data) can predict future behavior based on historical data.
- Time-series analysis algorithms can remove cyclic and seasonal variations to reveal trends in data.
- Creating simple and effective visualizations of model results is important to communicate your results.
- Every analytics model should be validated with metrics to assess its accuracy and output significance.
Extracting data from a blockchain quick facts
Data analytics models rely on data. You’ll need data to choose the right model, build it, train it, and then run it using new data. The process of preparing input data for an analytics model can be tedious.
In a blockchain environment, building an analytics dataset includes identifying the data you’ll need, fetching it from the blockchain, and then completing the data picture with related data from off-chain repositories. Although you follow the same basic steps each time you populate a model, adding blockchain to the process adds another layer of requirements. The following list summarizes common concepts for accessing blockchain data and techniques for building a dataset from a blockchain:
- Use a blockchain explorer to examine all blockchains of interest for data.
- Smart contracts store state data and generate transaction data.
- Smart contracts generate events that result in log file entries (valuable for a timestamped record of actions).
- Build an analytics lab to support connecting to all blockchains of interest using your favorite language, such as Python or JavaScript.
- Decide between extracting data from the blockchain first (faster for multiple runs) or letting the analytics model collect blockchain data in real time (access to the latest data).
- When a model needs off-chain data as well as on-chain data, prebuilding datasets is generally easier.
- Establish a strategy for aligning on-chain and off-chain identities.
- Access blockchain data through event filters, smart contract functions, or state data queries.
- Cleanse blockchain data and convert any formats or types as necessary.
- Identify related off-chain data (or on-chain data from another blockchain) and fetch related data.
- Either store full data in a format suited for reading into a dataframe or develop conversion code.
- Identify partitions in your dataset to use for training and testing your model. Quasi-random object selection for each partition increases your model’s accuracy.
- Devise a strategy for updating extracted data with fresh data.
- Plan to re-train models as your dataset changes.