HLO – DB MIGRATION AND STREAM PROCESSING ON AWS

Welcome to a new style of blogging, called the High-level overview (HLO) series. In these blogs, I will describe the problem, which usually is something I came across recently, followed by a high-level solution overview of how it could be solved. The goal is to get you to dig deeper into individual components that make this high-level solution possible.

Very recently, I had a call from one of our architects’ who was tasked with comparing different clouds and present a solution to his customer. While the customer was going forward with one solution, they were interested in finding out how a solution would be built out in AWS. The goal here was to have a replication environment for customer’s on-premise SQL server so it can be failed over to. Moreover, the customer wanted to be able to stream the data out of the SQL environment into an elastic MapReduce cluster for data analytics purposes. The customer was also concerned about storing large amounts of data into an archive and be able to retrieve it when needed. Needless to say, all the connectivity needed to be secure.

In summary –

Customer Objectives

OBJ.001 – Need to have functional disaster recovery environments for their data

OBJ.002 – Need to have an effective way to do data processing and modeling while keeping costs low

OBJ.003 – Need to have an archival methodology to fulfill long-term storage requirements.

OBJ.004 – Ensure data in transit is secure.

Functional Requirements

FR.001 – A busy SQL server needed replication, backup and archival to ensure availability, disaster recovery, and long-term storage.

FR.002 – Data from SQL server needs to be pushed into elastic MapReduce (EMR) for data processing and modeling.

FR.003 – Data from EMR needs to be archived for long-term storage purposes.

FR.004 – Secure connectivity to all services over which data would traverse from customer on-prem to a cloud provider.

Below is a high-level illustration of the current deployment as I understood.

 

The Options –

Given the customer objectives and functional requirements, AWS provides multiple products that help us define a solution to satisfy the customer’s use case. Let’s look at these products individually.

1. Amazon RDS – Amazon’s relational database service (RDS) provides scalable DBaaS offering with the ability to migrate and replicate SQL databases.

2. Amazon S3 – Amazon’s object storage offering, the Amazon S3 (Simple storage service) allows you to store large volumes of data with virtually unlimited capacity.

3. Amazon Glacier – AWS glacier is Amazon’s archival storage offering that allows you to archive PB scale data for extremely cheap prices. AWS glacier can also automatically archive data stored on Amazon S3 using life-cycle policies.

4. AWS Kinesis – Amazon Kinesis allows you to collect, process and analyze real-time data streams. Data can be analyzed using AWS Kinesis data analytics which allows the use of standard SQL queries. You can push this kinesis data stream to a stream processing framework like AWS Elastic MapReduce which support Apache Hadoop, Spark, and other big-data frameworks.

5. AWS Lamda – AWS Lamda is Amazon’s serverless technology that allows you to build purpose full functions and calls.

6. AWS SNS – AWS Simple Notification Service (SNS) allows you to trigger notifications or even AWS lambda functions based on pre-defined events.

7. AWS VPN Gateway – Allows you to create secure connections between sites.

8. AWS Storage Gateway – Allows you to deploy a virtual machine instance with different storage options on-premise. This virtual machine replicates all data stored on AWS S3 bucket.

9. AWS Snowball – AWS’s solution to migrate large amounts of data using cold migration techniques

10. AWS DirectConnect – Cost effective private network solution from on-premise to AWS datacenter for migrating large data. The solution can also be used to push network traffic on local networks rather than the internet.

Let’s connect the dots, slowly.

Networking

With AWS VPN Gateway, a customer can connect their on-prem environments securely to the AWS regions. This is crucial and fulfills customer’s FR.004 which requires all data to be secure in transit. It is important to remember that there is a limitation of 5 VPN Gateways per region. This limit can be increased by reaching out to AWS Support. Alternatively, there is an option for the customer to use AWS Directconnect that may be a better option in this scenario provided the customer’s data center is close to an AWS Partner Network provider (APN Technology Partner). AWS DirectConnect offers consistent high bandwidths (10GB) and a private connection into your VPC network on AWS. This means traffic that does not traverse the internet. DirectConnect is also ideal for real-time streaming data and can be used to seamlessly extend the customer’s network into AWS.

Migration

The customer currently has large sets of data that need to be migrated to AWS. While using DirectConnect is an option that allows for high bandwidth transfers, it can get very expensive. Amazon offers AWS Snowball to help transfer cold data into the AWS cloud. The process is simple. Once you put in a request for a snowball device, AWS sends you a secure device to your data center which can be connected to your environment. You can then copy all your data to this snowball device. Once done, you ship the device back to AWS. All data on the device is encrypted and is secure. AWS also offers Snowball edge that offers more compute within the device allowing you to access your data using a local EC2 instance. AWS Snowball has a limit of 50TB to 80TB while the edge device has a limit of 100TB.

For PB scale data migration, AWS offers SnowMobile. This is an 18-wheeler truck with a Container as a Datacenter. The container is transported to your data center and needs to be connected to power and network. Once done, you can copy PB scale data to this environment before Amazon picks it back up.

For a simpler way to transfer data, AWS offers storage gateways. A storage gateway is a light-weight virtual machine that can be deployed in your environment and configured with your AWS account. The gateway uses a local disk and exposes it as an iSCSI drive that is accessible by other virtual machines. Any data stored on this drive is then replicated to your AWS S3 storage account. Storage gateways’ are can be configured for hot, cold and cached data so you have a variety of options depending on your use case. Download of this storage gateway appliance is free of cost and so is the deployment so it has a low “barrier to entry” and can ideally be used for file transfers.

Storage and Archival

AWS’s S3 (Simple Storage Service) is an object data store that offers unlimited storage for files. S3 storage, like any object storage, is accessible over HTTP and HTTPS and can store data securely on AWS’s datacenters. The data is locally replicated but can also be replicated (Cross-region replication) across regions to increase availability. You can even serve these files directly from S3 into your application. An interesting concept of S3 is its ability to have life-cycle policies on your files which are stored in “S3 buckets”. You can set a life-cycle policy to archive all files after a set amount of time and S3 can move them over to AWS glacier – Amazon’s low-cost long-term archival solution.

Alerting

AWS’s Simple notification service (SNS) can be configured to alert the customer based on custom or pre-set triggers. You will find SNS being used in almost everything in AWS. For instance, when you create a new AWS account, the first thing to do is to create an SNS billing alert to ensure that you don’t exceed billing thresholds. SNS can also trigger or get triggered by other AWS services such as Lamda functions (Serverless).

Serverless

AWS Lamda is Amazon’s serverless technology which allows you to run objective-based focus functions based on events or triggers. You can trigger a lamda function to perform a certain task. For example, I can have an SNS service to ensure that billing does not exceed $100 per day. If it does exceed, I can have an event trigger sent to a lamda function that will immediately shut down by instances to save on billing costs.

Data analysis

AWS Kinesis is a solution for real-time data stream analysis. Real-time data can be collected and analyzed using AWS Kinesis data analytics – this can be helpful for this customer because the solution allows using regular SQL queries to analyze data. This data can also be pushed to a stream processing framework such as AWS Elastic MapReduce for big-data analysis before being archived.

Databases

AWS RDS (Relational Database Service) offers a managed database environment which can be readily consumed. You simply deploy database instances and pick a database flavor. Flavors such as Oracle and SQL are supported and can be deployed. This fulfills the customer’s use case where there was a need to migrate SQL database to a remote instance for disaster recovery purposes. AWS RDS allows SQL replication with changed data to be replicated from your primary database. You can even have read-only RDS instances and perform Disaster recovery tests to fulfill your Business continuity plan (BCP).

Putting it all together

 

I encourage you to read more about the different solutions discussed in this blog post. Feel free to comment.

Some important links

AWS Networking

AWS Migration

AWS RDS

Google in Talks with Rackspace for VMware Support

Today, news hit the stands that VMware was going to offer its VMware “cloud” services in both Microsoft Azure and Google Cloud platform.

Another piece of information that came about is that Google is in talks with Rackspace to provide VMware support for “VMware on GCP” as a product. A senior VMware executive, who wanted to remain anonymous, spoke with me about Google’s concerted effort to work with Rackspace on getting them to support any and all VMware deployments on GCP. Executives of both the companies met about three weeks back in San Antonio to discuss and begin early product development to build an operational support model.

Rackspace has one of the world’s largest hosted VMware footprint with about ~98000 single tenant virtual machines and growing at around 2500 virtual machines a month. The VMware business generates an estimated $450 million dollars a year for Rackspace and it is no doubt that GCP is looking to get a good chunk of it.

Interestingly, there has been little discussion done over VMware support on the Azure platform, so that is yet to be seen.

All this will look good for Apollo Equity when it goes public with Rackspace before end of this year. They are already halfway finished in getting there and insider sources say that they will be going public before the end of this year and aim to reach an asking price of $78 a share!

 

Azure Stack Services

Azure Stack does not “Everything” that public cloud Azure does. Only a subset of services are available today but this is a growing list. Many customers confuse over this and it is important to clarify before any Azure Stack conversations begin.

Azure Services on Azure Stack includes PAAS and IAAS. Ideally, Azure stack is best used when you want these PAAS services on your on-prem datacenter.

PAAS

Azure App Services – All your websites, easy to host service

Azure Functions – Serverless deployment of your code

Service Fabric – Azure Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices and containers.

Kubernetes – Deploy Kubernetes clusters and manage them.

Cloud Foundry – Pivotal cloud foundry on Azure stack.

IAAS

Virtual Machines – Linux and Windows virtual machines including VM scale sets.

Docker Containers -Linux and Windows containers

Networking – Virtual network, load balancer, and VPN Gateway.

Storage – Blobs, Tables, Queues and Disks.

Key Vault – Application keys and secrets for password encryption and checkouts.

Hope this quick write up helps.

Learning VMWare NSX – VCP6-NV

It is always amazing to hear from my readers any feedback about the book. Today I heard from a Senior Engineer that he passed the VCP6-NV and that the Learning VMware NSX book series was of great help. This is great to hear, as an author, I feel more motivated to write more and help each other.

delighted

Last Year 2017

Fresh as it was yesterday, last year 2017, I was with Rackspace and was proud to represent the company at DellEMC World 2017. I was the face of the company’s main marketing campaign and it was an exciting time for me.

Enjoy the video

VMware’s Virtual Cloud Network And The Emphasis Towards Networking

Dell EMC world is underway and new announcements and products are expected. Well, seems like we weren’t disappointed with VMware’s announcement introducing the Virtual Cloud Network.

VMware’s Virtual Cloud Network is built on technologies such as NSX and is an agile network and security fabric that connect apps, data and users regardless of their placement. This means apps running on AWS, GCP and private cloud can be brought under one network and security fabric simplifying scope of security and management. VMware is pushing towards the idea of having a singleton network fabric that can extend to multiple clouds and environments. The aim here is to have network evolve into a secure, programmable and a flexible fabric.

VMware’s virtual cloud network suite of products consists of,

  1. NSX SD-WAN – Powered by the recently acquired VeloCloud,  NSX SD-WAN provides the solution for connectivity and security across WAN. With NSX driven SD-WAN, you can now extend segmentation beyond your datacenter and seamlessly into multiple clouds/endpoints.
  2. NSX Cloud – Provides network and security solutions for public and private clouds.
  3. NSX Datacenter – Provides networking solutions for within the datacenter including containers and bare metal environments.
  4. NSX Hybrid Connect – Enables connectivity and seamless mobility between private and public clouds.

The Emphasis Towards Networking – VMware is finally coming in with a complete suite of networking products as it sets its sights on raising its profit margin by dominating the SDN market. For many years VMware was seen as a company that did not get networking right and VMware with the launch of Virtual cloud network is changing that perception. VMware, recently has been noted to have earned significant business with NSX and these investments into networking are a clear sign that network and security are part of the inner circle that is crucial for VMware’s success.

Analysts estimate that the network and security market will grow at a compound annual growth rate of 4.74% from 2017 to 2022 and the market today is estimated to be around $5 billion and growing. With VMware taking the prime spot of being the leader that ties in network and security across multiple clouds, it shouldn’t be a surprise if it takes a bigger chunk of the network and security market.

Getting Started with NSX-T

NSX-T is heating up and is quite exciting! If multi-cloud is your forte and we know the drive towards multiple cloud adoption is increasing. Here is a good way to get started!

NET 1510BU: Introduction to NSX-T

Speaker: Andrew Voltmer, Dimitri Desmidt

Andrew and Dimitri will provide details on NSX-T platform and its capabilities across various environments.

Download NSX-T Introduction

VSAN NETWORK CHATTER

What goes on on the VSAN Network? Let’s take a brief look at that so we can understand the different types of chatter that goes on this network.

First things first, there is the communication that takes place between all the hosts participating in a vSAN cluster. A heartbeat is sent from the master node to all the other nodes participating in a vSAN cluster. Since vSAN 6.6, this communication is done via unicast traffic.

When a host is part of the vSAN cluster, it can get one of the three roles – master, agent, and backup. As an admin, you have no control over who you can pick as a master vs a backup and this is completely handled by vSAN. This is the second type of communication that happens between the hypervisors participating in a vSAN cluster. The master node is responsible for getting the clustering, monitoring, membership and directory services updates to all nodes (CMMDS).  This traffic is unicast since vSAN 6. The volume of traffic between the master, agent, and backup is light and in steady state, so high bandwidth is not of a concern.

The majority of traffic on a vSAN network comes from the virtual machine disk I/O. VMs on the vSAN datastore is made up of a set of objects which are made up of one or more components. When a VM has multiple copies, it will have its replicas traverse the vSAN network on to other nodes. This is unicast traffic and forms the majority of the vSAN network traffic.

Best practice for the vSAN network is to have a minimum of 10Gb and no routing. If the traffic needs to be routed, then only use static routes in the environment but it is not recommended. Also do not put vSAN traffic on an overlay NSX network, because of circular dependency, this configuration is NOT supported.

CAN VSAN NETWORK RUN ON VXLANS?

An interesting question, if VSAN networking can be done/configured on VXLANS backed by NSX?

The answer is No and this is to avoid a circular dependency.

“However, very often, the question of compatibility is asked in the context of being able to place the vSAN network traffic on an NSX managed VxLAN/Geneve overlay. In this case, the answer is no, NSX does not support the configuration of the vSAN data network traffic over an NSX managed VxLAN/Geneve overlay. This is not unique to vSAN. The same restriction applies to any statically defined VMkernel interface traffic such as vMotion, iSCSI, NFS, FCoE, Management, etc.

Part of the reason for not supporting VMkernel traffic over the NSX managed VxLAN overlay is primarily to avoid any circular dependency of having the VMkernel infrastructure networks dependent on the VxLAN overlay that they support. The logical networks that are delivered in conjunction with the NSX managed VxLAN overlay are designed to be used by virtual machines which require network mobility and flexibility.”

Now you know..