Data Integrity and Lineage by using IOTA

2018-04-16

DLT, Data Integrity, Data Lineage, Distributed Ledger Technology, IOTA, IoT, Tangle

Edit log:###

2018.09.25
This article is now expanded to an article series, where we have more detailed discussion and open-source code, check them out!

2018.08.26 - Updated the data schema:

Have an unified format that covers both lightweight format and standard format, but more flexible and self-explained.
Specified mandatory fields and optional field in the format. For example, Timestamp is now an optional field.

Introduction

If we say “Data is the new oil”, then data lineage is an issue that we must to solve. Various data sets are generated (most likely by sensors), transferred, processed, aggregated and flowed from upstream to downstream.

The goal of data lineage is to track data over its entire lifecycle, to gain a better understanding of what happens to data as it moves through the course of its life. It increases trust and acceptance of result of data process. It also helps to trace errors back to the root cause, and comply with laws and regulations.

Example of data lineage visualization, for Report 1, 2 and 3

You can easily compare this with the traditional supply chain of raw materials in manufacturing industry and/or logistic industry. However, compares to the traditional industries, data lineage are facing new challenges.

Running IOTA Full Node

2018-02-19

Blockchain, DLT, Distributed Ledger Technology, Fullnode, IOTA

I have been looking at IOTA since last winter, as it seems promising for IoT, Machine-to-Machine Micro-payments and Data Market scenarios.

Installing an IOTA light wallet is pretty straightforward, but running a full node is not. But thanks to the great playbook, I managed to setup a Virtual Private Server to run as an IOTA full node.

2 cores CPU
4 GM memory
SSD
Hosted 24/7 in a data center in Western Europe

Infrastructure-as-Code and CI/CD in the real world, with VSTS and Azure (Part 1)

2017-11-05

Azure, Azure Template, Build, Continuous Deployment, Continuous Integration, DevOps, Infrastructure as Code, Release Management, VSTS

Hello again!

It has be been a while since my last post. It is because I was quite busy leading a team in a program for delivering veracity.com, the open industry data platform from DNV GL. It is a pretty exciting project - to build an open, independent data platform with bleeding edge technologies, to serve a large user base (100 000 registered users). You can read more about veracity at here and here.

It actually is a long and interesting story behind veracity (and its predecessor), together with all challenges that we encountered in this journey. Hopefully I can share them with you in the future.

Anyway, today I would like to talk about in the real world, how Infrastructure-as-Code looks like, together with Azure and VSTS.

OAuth in Azure AD B2C with Nodejs

2017-06-28

Authentication, Authorization, Azure, Azure AD B2C, Nodejs, OAuth, OpenID

Recently we need to build a Nodejs single-page-application (SPA) solution that is using Azure AD B2C as the identity provider (idp). Since it is a single-page-application, we are going to use OAuth2 Implicit Flow.

This article demonstrates the basic steps for setting up both the server side (WebAPI) as well as the client application.

Using python to organize pictures

2017-05-17

Python, Scripting

Problem

Having several digital cameras is fun: you can have different photography experiences.

However, organizing pictures is far less interesting, especially if you do not have a consistent process (like naming convention) for archiving. After several years, I end up with hundred thousand pictures sitting in messy huge folders:

Nikon_Pictures
Backup_SDCard01
100_0302
DCIM_From_Old_Phone
100CANON
Backup-Photo
etc…

The most tricky part, is that I have so many duplicate pictures everywhere due to inconsistent archiving during years. It is so messy that I never dare to manually clean them up.

Naturally, the knowledge of programming came to my rescue. This time, it is Python.

Tracking subdomains with Google Tag Manager

2017-04-16

Google Analytics, Google Tag Manager

Recently I am investigating how to track user behaviors across our digital services.

We have web applications like:

example.com (the company homepage)
service-A.example.com (digital service A)
service-B.example.com (digital service B)

and we are using Google Tag Manager (and Google Analytics)

Setup VIM plugin

2017-04-09

Plugin, Vim

Time to revisit my VIM plugin system after VIM is upgraded to version 8.0

Previously I was using Vundle but it is bit complicate to set up quickly. This time I am using vim-plug.

Jump-start ASP.Net Core with Docker

2017-03-31

ASP.Net Core, Docker

Environment Setup

I have a typical developers environment: Windows 10 Enterprise X64 (Version 1511, OS build 10586.839). Installed DotNet Core 1.0.1 and VS Code. In VS Code there are two extension installed.

Enable Hyper-V

VirtualBox is no longer needed! Simply enable the Hyper-V on on Windows 10 by running powershell commands (as Administrator)

1 2	Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V -All DISM /Online /Enable-Feature /All /FeatureName:Microsoft-Hyper-V

You might need to change the BIOS setting. Read more at here.

Note:
The document from Docker also mentioned that the virtualization must be enabled, and said you can verify it in the Task Manager. However, I can not find “Virtualization” label in my Task Manager. But the following steps work fine anyway.

Install Docker

Head to Docker official site, download and install Docker for Windows. The version I installed was 17.03.1-ce, build c6d412e Community Edition, via Edge channel.

Lets verify it.

Review and manage your web site disk space in Azure

2017-02-23

Azure, DevOps, KUDU, Monitoring

Problem

We have a large distributed system which is hosted in Azure. The front end web application are Azure web sites.

From time to time, the web applications were down, due to running out disk space in the Azure web sites. Our operation team would like to quickly identify what are the large files and how we can free up disk space in Azure web sites.

Lucky, Azure application service already provides a nice tool for this type of work: Kudu service.