Data Integrity and Lineage by using IOTA

Edit log:###

2018.09.25
This article is now expanded to an article series, where we have more detailed discussion and open-source code, check them out!

2018.08.26 - Updated the data schema:

  1. Have an unified format that covers both lightweight format and standard format, but more flexible and self-explained.
  2. Specified mandatory fields and optional field in the format. For example, Timestamp is now an optional field.

Introduction

If we say “Data is the new oil”, then data lineage is an issue that we must to solve. Various data sets are generated (most likely by sensors), transferred, processed, aggregated and flowed from upstream to downstream.

The goal of data lineage is to track data over its entire lifecycle, to gain a better understanding of what happens to data as it moves through the course of its life. It increases trust and acceptance of result of data process. It also helps to trace errors back to the root cause, and comply with laws and regulations.

You can easily compare this with the traditional supply chain of raw materials in manufacturing industry and/or logistic industry. However, compares to the traditional industries, data lineage are facing new challenges.

Read More

Share Comments

Running IOTA Full Node

I have been looking at IOTA since last winter, as it seems promising for IoT, Machine-to-Machine Micro-payments and Data Market scenarios.

Installing an IOTA light wallet is pretty straightforward, but running a full node is not. But thanks to the great playbook, I managed to setup a Virtual Private Server to run as an IOTA full node.

  • 2 cores CPU
  • 4 GM memory
  • SSD
  • Hosted 24/7 in a data center in Western Europe

Read More

Share Comments

Infrastructure-as-Code and CI/CD in the real world, with VSTS and Azure (Part 1)

Hello again!

It has be been a while since my last post. It is because I was quite busy leading a team in a program for delivering veracity.com, the open industry data platform from DNV GL. It is a pretty exciting project - to build an open, independent data platform with bleeding edge technologies, to serve a large user base (100 000 registered users). You can read more about veracity at here and here.

It actually is a long and interesting story behind veracity (and its predecessor), together with all challenges that we encountered in this journey. Hopefully I can share them with you in the future.

Anyway, today I would like to talk about in the real world, how Infrastructure-as-Code looks like, together with Azure and VSTS.

Read More

Share Comments

OAuth in Azure AD B2C with Nodejs

Recently we need to build a Nodejs single-page-application (SPA) solution that is using Azure AD B2C as the identity provider (idp). Since it is a single-page-application, we are going to use OAuth2 Implicit Flow.

This article demonstrates the basic steps for setting up both the server side (WebAPI) as well as the client application.

Read More

Share Comments

Using python to organize pictures

Problem

Having several digital cameras is fun: you can have different photography experiences.

However, organizing pictures is far less interesting, especially if you do not have a consistent process (like naming convention) for archiving. After several years, I end up with hundred thousand pictures sitting in messy huge folders:

  • Nikon_Pictures
  • Backup_SDCard01
  • 100_0302
  • DCIM_From_Old_Phone
  • 100CANON
  • Backup-Photo
  • etc…

The most tricky part, is that I have so many duplicate pictures everywhere due to inconsistent archiving during years. It is so messy that I never dare to manually clean them up.

Naturally, the knowledge of programming came to my rescue. This time, it is Python.

Read More

Share Comments

Tracking subdomains with Google Tag Manager

Recently I am investigating how to track user behaviors across our digital services.

We have web applications like:

  1. example.com (the company homepage)
  2. service-A.example.com (digital service A)
  3. service-B.example.com (digital service B)

and we are using Google Tag Manager (and Google Analytics)

Read More

Share Comments

Setup VIM plugin

Time to revisit my VIM plugin system after VIM is upgraded to version 8.0

Previously I was using Vundle but it is bit complicate to set up quickly. This time I am using vim-plug.

Read More

Share Comments

Jump-start ASP.Net Core with Docker

Environment Setup

I have a typical developers environment: Windows 10 Enterprise X64 (Version 1511, OS build 10586.839). Installed DotNet Core 1.0.1 and VS Code. In VS Code there are two extension installed.

Enable Hyper-V

VirtualBox is no longer needed! Simply enable the Hyper-V on on Windows 10 by running powershell commands (as Administrator)

1
2
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V -All
DISM /Online /Enable-Feature /All /FeatureName:Microsoft-Hyper-V

You might need to change the BIOS setting. Read more at here.

Note:
The document from Docker also mentioned that the virtualization must be enabled, and said you can verify it in the Task Manager. However, I can not find “Virtualization” label in my Task Manager. But the following steps work fine anyway.

Install Docker

Head to Docker official site, download and install Docker for Windows. The version I installed was 17.03.1-ce, build c6d412e Community Edition, via Edge channel.

Lets verify it.

Read More

Share Comments

Review and manage your web site disk space in Azure

Problem

We have a large distributed system which is hosted in Azure. The front end web application are Azure web sites.

From time to time, the web applications were down, due to running out disk space in the Azure web sites. Our operation team would like to quickly identify what are the large files and how we can free up disk space in Azure web sites.

Lucky, Azure application service already provides a nice tool for this type of work: Kudu service.

Read More

Share Comments

Announcing new version of DNV GL Rules Chatbot

Happy new year!

I am happy to announce the new version of Rules Chatbot is ready now. The key new features are below:

  • Support full text search (based on SharePoint Search)
  • Indexing DNV GL classification rules documents with enhanced metadata
  • Customized ranking
  • Upgraded to Bot Framework 3.0

URL:
https://dnvgl-rules-bot.azurewebsites.net

Read More

Share Comments