Pages

Monday, 31 December 2018

Route53: Domain Name System (DNS) from AWS

Route53 is the Domain Name System (DNS) service provided by AWS. Below are some basic points regarding Route53:

1. Domain Name System (DNS): Translates names like www.example.com into the numeric IP addresses like 192.0.2.1.

2. Why "53" in name? This services is named Route53 as port 53 belongs to TCP/UPD and mainly handles DNS queries.

3. Routes traffic based on multiple criteria, such as endpoint health, geographic location, and latency. Ensure end users are routed to the closest healthy endpoint for your application.

4. Routing Policies: Simple, Weighted (example: 75% to one server, 25% to other), Latency-based, Failover, Geo-location based.

5. Configure DNS health checks to route traffic to healthy endpoints or to independently monitor the health of your application and its endpoints. It re-route your users to an alternate location if your primary application endpoint becomes unavailable.

6. Also offers Domain Name Registration.

7. Record Sets: NS, SOA, A, AAAA, CNAME

CloudFront: Content Delivery Network (CDN) from AWS

CloudFront is the Content Deliver Network service provided by AWS. Below are some basic points regarding CloudFront:

1. Distribution service / Content Delivery Network (CDN) from AWS.

2. Edge Location: The CloudFront network has 160 points of presence (PoPs) as of now.

3. Edge server caches the data to improve latency and lower the load on your origin servers. 

4. Highly Programmable and Customizable content delivery with LAMBDA@EDGE: Lambda@Edge functions, triggered by CloudFront events, extend your custom code across AWS locations worldwide, allowing you to move even complex application logic closer to your end users to improve responsiveness.

5. CDN Origins: EC2, ELB, S3, Route53

6. TTL (Time to Live): How long your content will be cached/stored at Edge location? Defined in seconds. Default TTL: 24 hours (86400 seconds), Maximum TTL: 365 days (31536000 seconds), Minimum TTL: 0

7. Price Class: Use All Edge Locations (Best Performance), Use Only US, Canada, Europe, Use US, Canada, Europe, Asia and Africa. Price is charged accordingly.

8. Default CloudFront URL: *.cloudfront.net

9. Protocol supported: FTP

10. You can blacklist/whitelist users based on Geo-location.

11. Clearing cache from Edge location is chargeable.

Sunday, 30 December 2018

AWS Compute Services: EC2, Elastic Beanstalk, Lambda and ECS

EC2 (Elastic Compute Cloud), Elastic Beanstalk, Lambda and ECS (Elastic Container Service) are the compute service offerings from AWS. Below are some basic points regarding these AWS compute services:

EC2

1. Most common AWS service called Elastic Compute Cloud.

2. This is the Virtual Server in AWS.

3. Categories of EC2:
  • On Demand Instances (Charged hourly)
  • Spot Instances (Bid-based, Choose it when Start and End date is not a concern)
  • Reserved Instances (1 year or 3 year contract, cheaper than on-demand)
  • Scheduled Reserved Instances (Scheduled Instances)
  • Dedicated Host and Instances
4. EC2 Types:
  • General Purpose (T2, M5)
  • Compute Optimized (C5)
  • Memory Optimized (X1, R4)
  • Storage Optimized (I3, D2)
  • Accelerated Computing / GPU Optimized (P3, G3, F1)
Please note that EC2 is the most important topic in AWS. So, for details, please go through the official documentation.

Elastic Beanstalk

1. Simple way to deploy your application on AWS. No need to take headache of managing the infrastructure.

2. Just upload your application code and the service automatically handles all the details such as resource provisioning, load balancing, auto-scaling, and monitoring.

3. Supports PHP, Java, Python, Ruby, Node.js, .NET, Go and Docker.

4. Elastic Beanstalk uses core AWS services such as Amazon EC2, Amazon Elastic Container Service (Amazon ECS), Auto Scaling, and Elastic Load Balancing to support your applications.

5. Monitor and manage the health of your applications.

Lambda

1. Lambda lets you run code without managing any server (Go Serverless). Just upload your code to Lambda or write your code in Lambda Code Editor and it takes care of everything required to run it.

2. Any code uploaded to Lambda becomes Lambda Function. Code should be written in stateless style. If you need to store any state in between, save it to S3, Dynamo DB etc. and then retrieve from there.

3. Lambda can be directly triggered by AWS services such as S3, DynamoDB, Kinesis, SNS, CloudWatch, API Gateway and Web Applications. Use cases: https://aws.amazon.com/lambda/

4. Languages supported: C#, Java, Python, Ruby, Go, Powershell, Node.js

5. You pay only for the compute time you consume - there is no charge when your code is not running. You are charged for every 100ms your code executes and the number of times your code is triggered. You don't pay anything when your code isn't running.

ECS

1. Elastic Container Service (ECS) is a container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS. 

2. Amazon ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.

3. Containers without Servers: With Fargate, you no longer have to select Amazon EC2 instance types to run your containers.

4. Amazon ECS launches your containers in your own Amazon VPC, allowing you to use your VPC security groups and network ACLs. 

5. With simple API calls, you can launch and stop Docker-enabled applications, query the complete state of your application, and access many familiar features such as IAM roles, security groups, load balancers, Amazon CloudWatch Events, AWS CloudFormation templates, and AWS CloudTrail logs.

Thursday, 27 December 2018

AWS Data Transport Solution: Snowball, Snowball Edge and Snowmobile (Data Truck)

It can cost thousands of dollars to transfer 100 terabytes of data using high-speed Internet. The same 100 terabytes of data can be transferred using two Snowball devices for as little as one-fifth the cost of using the Internet. For example, 100 terabytes of data will take more than 100 days to transfer over a dedicated 1 Gbps connection. That same transfer can be accomplished in less than one week, plus shipping time, using two Snowball devices.

Below are some basic points to remember about Snowball: 

1. Snowball is a petabyte-scale data transport solution to transfer large amounts of data into and out of the AWS Cloud. Even with high-speed Internet connections, it can take months to transfer large amounts of data. 

2. One snowball can contain approx. 50 TB of data.

3. With Snowball, you don’t need to write any code or purchase any hardware to transfer your data. Create a job in the AWS Management Console ("Console") and a Snowball device will be automatically shipped to you. Once it arrives, attach the device to your local network, download and run the Snowball Client ("Client") to establish a connection, and then use the Client to select the file directories that you want to transfer to the device. The Client will then encrypt and transfer the files to the device at high speed. Once the transfer is complete and the device is ready to be returned, the E Ink shipping label will automatically update and you can track the job status via Amazon Simple Notification Service (SNS), text messages, or directly in the Console.

4. Snowball Edge: 100 TB (storage as well as compute functionality). Local compute equivalent to EC2 (m4.large) instance.

5. Snowmobile: Data-truck with storage up to 100 PB.

Tuesday, 25 December 2018

Kinesis, Firehose and MapReduce: AWS Data Analytics Service

Kinesis, Firehose and Elastic MapReduce are very useful data analytics offerings from AWS. 

You can capture real time data and analyze it in parallel using Kinesis and Firehose. No need to wait to take data in warehouse and then run analytics. Below are some basic and important points about Kinesis and Firehose to remember:

1. Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. 

2. With Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. 

3. Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

4. With Kinesis, you can perform real-time analytics on data that has been traditionally analyzed using batch processing in data warehouses. The most common use cases include data lakes, data science and machine learning. 

5. No need to first save data into warehouse and then run analytics. No need of batch processes. All is done real-time.

6. Types: Kinesis Data and Video Streams, Firehose (also has processing capacity unlike Kinesis), Kinesis Analytics (takes data from Kinesis and Firehose and run SQL queries on it, pay only for the queries you run)

“Kinesis Video/Data Streams” vs “Firehose”

1. Firehose is fully managed whereas Kinesis Streams is manually managed.

2. Firehose PREPARE and LOAD data streams to S3, RedShift, ElasticSearch, Kinesis Data Analytics and Splunk whereas Kinesis Streams just STORES (for 1-7 days) the data streams and you need to write application using Lambda, EC2, Kinesis Data Analytics and Spark to PROCESS it.

For more details, please visit documentation.

EMR (Elastic MapReduce)

1. Big data analysis service

2. Used by data scientist for log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.

3. EMR provides a managed Hadoop framework using which you can process vast amounts of data across dynamically scalable Amazon EC2 instances. 

4. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.

AWS Application Integration Services: SQS, SNS and SWF

We will look at some basic points of AWS Application Integration services like SQS, SNS and SWF.

SQS (Simple Queue Service)

1. Fully managed message queues for microservices, distributed systems, and serverless applications.

2. Enables application components and microservices to communicate with each other.

3. Pull based system

4. Queue Types: Standard and FIFO

5. Can be used with Redshift, DynamoDB, RDS, EC2, ECS, Lambda, S3 and SNS.

6. Multiple copies of every message are stored redundantly across multiple availability zones so that they are available whenever needed.

For more details, you can read documentation

SNS (Simple Notification Service)

1. Fully managed pub/sub messaging service for microservices, distributed systems, and serverless applications.

2. Push-based, many-to-many messaging.

3. Publishers to Topic: EC2, S3, RDS, CloudWatch

4. Subscribers to Topic: Serverless functions (Lambda), Queues (SQS), HTTP/S endpoints and distributed systems. Additionally, SNS fans out notifications to end users via mobile push messages, SMS, and email.

5. SNS uses cross availability zone message storage to provide high message durability. 

6. SNS Topic owners can keep sensitive data secure by setting topic policies that restrict who can publish and subscribe to a topic.

SWF (Simple Workflow Service)

1. SWF lets you write your application components and coordination logic in any programming language and run them in the cloud or on-premises.

2. SWF creates a logical separation between tasks and components and acts as a task coordinator. 

Monday, 24 December 2018

AWS Storage Services: S3, Glacier, EBS, EFS, FSx and Storage Gateway

AWS provides following services under storage section:

1. S3
2. Glacier
3. EFS
4. FSx
5. Storage Gateway

Following are some basic and important points about AWS Storage services:

S3

1. Cloud storage service like Dropbox and Google Drive.

2. Object based storage not block level (like EBS and EFS). Data is treated as object. Single object in S3 can be uploaded up to 5TB in multi-part. You cannot install OS and software on it.

3. Buckets: Data is stored in buckets which are similar to Windows folders. Bucket name must be in lower case and alphanumeric. Bucket name must be unique globally. By default bucket is private.

4. Versioning: Versioning takes more space as each version is saved individually in same or different bucket. Versioning must be done for cross-region replication. Once versioning is enabled, it can’t be disabled, only suspended.

5. Storage Class
  • Standard (Frequently accessed data, minimum storage duration: 30 days,  99.999999999% durability (11 times 9))
  • Intelligent-Tiering (Long-lived data with changing or unknown access patterns)
  • Standard-IA (Long-lived, infrequently accessed data, minimum storage duration: 30 days, 99.999999999% durability (11 times 9))
  • One Zone-IA (Long-lived, infrequently accessed, non-critical data)
  • Glacier (Data archiving with retrieval times ranging from minutes to hours, minimum storage duration: 90 days)
  • Reduced Redundancy (Not recommended, Frequently accessed, non-critical data which even if get lost, it does not hamper you)
6. Encryption: 
  • SSE-S3 (uses AES 256 encryption methods)
  • SSE-KMS (Key Management Service)
  •  SSE-C (Client Side Encryption)
7. Bucket URL syntax: https://s3.regionname.amazonaws.com/bucketname/objectname

8. Eventual Consistency: When we upload a new file to S3, it becomes available immediately, but when we perform overwrite and delete operation, there is some delay which is known as eventual consistency. When a file upload to S3 is successful, it returns HTTP200 status.

9. Security: Data is secured using ACL (Access Control List) and Bucket Policies at Bucket or Object level. You can write custom bucket policies using JSON.

10. Data Transfer Acceleration: Enables quick upload of data to S3 bucket over long distance using CloudFront.

11. Lifecycle Management: You can manage transition of file from one storage class to another using Lifecycle rules. For example, you can move a file from Standard Storage Class to Infrequently Access Storage Class after some days (min 30 days) if it is not frequently used now. Similarly, if you want to archive this file after some days (min 30 days), you can further move it to Glacier.

12. Static Website Hosting: You can host static website and customize the URL using Route53.

Glacier 

1. Data Backup and Archive

2. Types of data: Hot Data (which we need on daily basis), Cold Data (which we don’t need on daily basis, archive this data to Glacier).

3. Delay in retrieval time and may take 3-5 hours.

4. Minimum storage duration in Glacier is 90 days. Archives deleted before 90 days incur a pro-rated charge equal to the storage charge for the remaining days.

EBS

1. Elastic Block Storage (just like Hard Disk of your laptop and can only be used by mounting on an EC2 instance unlike S3).

2. Backup of EBS volumes is called Snapshot and is done in incremental fashion. You can also take point-in-time snapshots of your EBS volumes and save it on S3.

3. To take backup of Root EBS (where OS is running), you must stop it first for data integrity.

4. Root EBS can’t be encrypted and “Delete on Termination” is checked by default.

5. To share snapshots between AWS accounts, make sure snapshots MUST NOT be encrypted.

6. Multiple Availability Zone is NOT supported.

7. Cannot attach one EBS volume to multiple EC2. Use EFS for this.

8. RAID0, RAID1 and RAID10 (combination of both) are preferred. RAID5 is discouraged.

9. EBS Volume Types
  • General Purpose (SSD) (gp2) volumes can burst to 3000 IOPS, and deliver a consistent baseline of 3 IOPS/GiB. 
  • Provisioned IOPs (SSD) (io1) volumes can deliver up to 64000 IOPS, and are best for EBS-optimized instances. 
  • Max Throughput Optimized HDD (ST1) – For frequent accessed data
  • Max Cold HDD (SC1) – For IA (in-frequent accessed data)
  • Magnetic volumes, previously called standard volumes, deliver 100 IOPS on average, and can burst to hundreds of IOPS. Lowest cost
EFS

1. Elastic File System somewhat like EBS. 

2. EFS can be mounted on several EC2 instances and on-premise servers at the same time unlike EBS.

3. EFS currently only works with Linux, not with Windows.

4. EBS has fixed amount of storage while EFS can be scaled whenever required.

5. Coming soon, the Amazon EFS Infrequent Access storage class.

6. EBS and EFS cannot be used as an origin for CDN unlike S3.

7. EBS and EFS are faster than S3 as these are directly mounted on EC2.

Storage Gateway

1. Integrates on-premise datacenter storage with cloud storage.

2. It connects to AWS storage services, such as Amazon S3, Amazon Glacier, and Amazon EBS, providing storage for files, volumes, and virtual tapes in AWS.

3. Storage Gateway is downloaded and installed at on-premise.

4. Caching and monitoring of data using Storage Gateway.

5. File Gateway: Simple file storage using NFS (Network File System) protocol.

6. Volume Gateway: Hard disk / block storage, cached mode (frequent access data is in cache of Volume Gateway and entire data is in cloud) and storage mode (entire data is in data center and asynchronously backed up to cloud).

7. Tape Gateway

Sunday, 23 December 2018

AWS Database Services: RDS, DynamoDB, ElastiCache, Neptune and Redshift

AWS provides following services under database section:

1. RDS
2. DynamoDB
3. ElastiCache
4. Neptune
5. Redshift

Following are some basic and important points about AWS Database services:

RDS

1. Relational Database Service. Supports Aurora, MySQL, MariaDB, PostgreSQL, Oracle, MS SQL Server.

2. Backup and Restore methods: Automated (done by AWS automatically, backs up data with transaction logs) and Snapshots (manual process, usually done by system admins).

3. To improve DB performance, you can use ElastiCache, DAX and Read Replicas.

Aurora 

1. Combination of MySQL and PostgreSQL (RDBMS based).

2. Up to 5 times faster than standard MySQL databases and 3 times faster than standard PostgreSQL databases.

3. Automatically scales up to 64TB on SSD per database instance.

4. Replicates 6 copies of database across 3 Availability Zones.

5. Each DB cluster can have up to 15 READ replicas.

6. Failover takes less than 30 seconds.

7. Backs up database to S3.

8. You can monitor database performance using Amazon CloudWatch.

DynamoDB

1. DynamoDB (Dynamo Database or DDB) is Amazon NoSQL Database.

2. DynamoDB Security is provided by Fine-Grained Access Control (FGAC) mechanism. FGAC is based on the AWS IAM.

3. DynamoDB Triggers integrate with AWS Lambda.

4. DynamoDB Streams provides a 24-hour chronological sequence of updates to items in a table. AWS Lambda can read updates to a table from a stream.

5. Dynamo DB Accelerator (DAX) is in-memory database cache for Dynamo DB.

Neptune

1. Graph based database

Redshift 

1. Data Warehouse and Reporting System in the Amazon Cloud.

2. Use OLAP (Online Analytical Processing), SQL and BI tools to analyze the data.

3. Redshift Spectrum extends the power of Redshift to query unstructured data in S3 (without loading your data into Redshift).

ElastiCache

1. In-memory database cache in the Amazon Cloud for fast performance.

2. ElastiCache Engines: Redis, Memcached

DMS

1. Database Migration Service with zero/negligible downtime.

2. Supports homogenous (example: Oracle to Oracle) and heterogeneous (example: Oracle to Aurora or MySQL) database migration.

About the Author

I have more than 10 years of experience in IT industry. Linkedin Profile

I am currently messing up with neural networks in deep learning. I am learning Python, TensorFlow and Keras.

Author: I am an author of a book on deep learning.

Quiz: I run an online quiz on machine learning and deep learning.