Form your cloud#1 Please smile! Highly available web camera photo upload to s3 bucket


This article show how to create simple highly available web application which allow to upload photo from your web camera to s3 bucket. Whole example can be done using AWS Free Tier. Technology stack:

  • Terraform
  • Ansible
  • Packer
  • AWS EC2
  • AWS S3
  • Piece of HTML5&JS
  • Piece of go lang
  • Certbot

Full source can be found there: 

Application contain 3 main parts:

  • Frontend
    • HTML5 video capture from camera
    • Snap photo and send it to backend using JQuery Ajax call
    • List existing photos in bucket as hyperlinks
  • Backend
    • Serve frontend
    • Integrate with S3
    • Expose actions like upload, list, get particular image
  • Infrastructure
    • Bring up EC2 instances in ASG from baked AMI with application
    • Spread across 3 AZs
    • Spawn ALB to access application

Proposed design:

form-you-cloud-1 (1)

Go app

Our application is really simple web application, exposing some endpoints in HTTP protocol and integrating with AWS S3.

Main libraries:

Endpoints :

  • GET / – dummy web UI with camera capture using HTML5 video and list of photos
  • GET /image/{image_name} – get particular image/photo
  • POST /image – save new image/photo

Screen of simple UI – you can see A4 paper which I’m showing to camera

Screenshot from 2018-07-27 19-12-04

That UI is served directly from go as it’s saved in go application as big multi-line string. List of images is generated using built-in template engine in go:

{{ range $key, $value := . }}
	<li><a href="/image/{{ $value.Key }}">{{ $value.Key }}</a></li>
{{ end }}

Map to iterate is generated in following way:

func GetRootSite(ResponseWriter http.ResponseWriter, _ *http.Request) {
	svc := SpawnAwsSdkS3Service()
	resp, err := svc.ListObjects(&s3.ListObjectsInput{Bucket: aws.String(bucket)})
	if err != nil {

	tmpl, err := template.New("hello").Parse(site_template)
	if err != nil { panic(err) }
	err = tmpl.Execute(ResponseWriter, resp.Contents)
	if err != nil { panic(err) }

Comments to lines:

2 – my wrapper for S3 session from AWS SDK

3 – list bucket to get existing images

8 – parse site_template – big string in code with HTML/JS UI

10 – render template directly to HTTP response, passing as parameter list from line 3

Trick with HTML5/JS camera comes from:

I modified a little script to snap camera view into hidden canvas and send it to endpoint using JQuery AJAX call after clicking Snap&Upload button. To see effect I just simply reload site after upload to add image on list. What interesting here


return snapped photo as base64 with content-type header. So I’m just sending that to backend and then simply remove the header and convert that to bytes:

decoded, err := base64.StdEncoding.DecodeString(strings.Split(string(body), ",")[1])

S3 file upload is like:

_ , err1 := svc.PutObject(&s3.PutObjectInput{Key: aws.String(strconv.FormatInt(time.Now().UnixNano(), 10)+".jpg"), Bucket: aws.String(bucket), Body: bytes.NewReader(decoded)})

so create new file in s3 bucket with numeric timestamp name.

Last important sentence: application take s3 bucket name as starting argument to allow dynamic configuration and avoid hardcoding that in code.

How about state?

As we have application we can take a look on infrastructure/deployment now. Firstly, before working with Terraform, it’s good practice to store state in S3 bucket.

variable "state_bucket_name" {


resource "aws_s3_bucket" "state" {
  bucket = "${var.state_bucket_name}"
  acl    = "private"

  versioning {
    enabled = true


Activated versioning is important, especially in case of damaged state when you can rollback state to previous version as fixing damaged state in Terraform is something what definitely should be avoided. Also we want to keep that bucket private as it’s showing some internals of infrastructure construction so it would be useful for hackers.

Ok so we have bucket for state but building that s3 bucket also generate state – so it’s classic chicken and egg problem, where should I store that state? As it’s one timer operation it’s totally safe to store that state in file system so usually I commit that to git repository.

To tell Terraform to use that state bucket we need to put following config:

terraform {
  backend "s3" {
    key = "deployment"

Where key is path to state file inside the bucket.

After adding that config we need to properly initialize Terraform and activate s3 state store with command:

terraform init -backend-config="bucket=bucket_name" -backend-config="region=bucket_region"

Bake image with Ansible and Packer

As application is ready it need to be delivered to AWS somehow. To achieve that we are going to use Ansible and Packer to build application and create AMI following immutable infrastructure pattern.

To start with Packer we need subnet in AWS as flow which Packer makes for you is like:

  • start new EC2 instance
  • apply Ansible playbook
  • create AMI
  • destroy EC2 instance

Code for placing subnet in TF:

resource "aws_subnet" "packer" {
  vpc_id = "${}"
  cidr_block = "${local.packer_cidr}"
  availability_zone = "${local.region}a"

  tags {
    Name = "packer"

resource "aws_route_table_association" "packer" {
  subnet_id = "${}"
  route_table_id = "${}"

Comments to lines:

3 – it’s good practice to define your network CIDRs in one file and refer that using TF locals as it gives you better overview of network design

Now we need Ansible playbook to build application and transfer it to destination machine what can be done in naive way as Packer actually start Ansible on your machine to provision EC2 instance for building AMI.

- hosts: all
  become: yes
  gather_facts: yes
    ansible_python_interpreter: /usr/bin/python3
    - name: build app
      become_user: jbujny
      local_action: shell go build chdir=../../goapp
    - name: copy go app
        src: ../../goapp/goapp
        dest: /opt/goapp
        mode: 0777

Comments to lines:

6 – you need to place that if you want to run Ansible on some new Ubuntu to tell Ansible that it should use python3 instead of python2

10 – build our go app on host machine (my PC)

11 – copy that application to remote host

So finally packer json file:

  "description": "Go app for uploading files",
  "builders": [
      "type": "amazon-ebs",
      "ami_name": "go-app-{{ timestamp }}",
      "region": "eu-central-1",
      "source_ami_filter": {
        "filters": {
          "virtualization-type": "hvm",
          "name": "ubuntu/images/*ubuntu-bionic-18.04-amd64-server-*",
          "root-device-type": "ebs"
        "owners": ["099720109477"],
        "most_recent": true
      "instance_type": "t2.micro",
      "ssh_username": "ubuntu"
  "provisioners": [
      "type": "ansible",
      "playbook_file": "playbook.yml",
      "user": "ubuntu"

Comments to lines:

6 – that name is important as we will refer to that in Terraform

8 – find Ubuntu 18.04 as base image

18 – that user has sudo on Ubuntu machines in AWS

21 – tell Packer to use Ansible to provision AMI

Now we can start packer by just typing:

packer build packer.json

and packer should create AMI for us so we have ready image with application – now it’s time for infrastructure.


Our application need following network to operate:

  • VPC
  • 3 subnets for auto scaling group to spread EC2 instances and make multi-az HA application
  • 3 subnets for application load balancer
  • security group for ec2 instances
  • security group for ALB

We are going to place network in separate terraform state to make better isolation between different part of application.

Firstly to allow communication to clients we need internet gateway and proper routing:

resource "aws_internet_gateway" "internet" {
  vpc_id = "${}"

  tags {
    Name = "main"

resource "aws_route_table" "internet" {
  vpc_id = "${}"

  route {
    cidr_block = ""
    gateway_id = "${}"

  tags {
    Name = "internet"

Here is example of single subnet which is part of ASG.

resource "aws_subnet" "asg_a" {
  vpc_id = "${}"
  cidr_block = "${local.asg_a}"
  availability_zone = "${local.region}a"

  tags {
    Name = "asg_a"

output "asg_a_subnet_id" {
  value = "${}"

resource "aws_route_table_association" "asg_a" {
  subnet_id = "${}"
  route_table_id = "${}"

I don’t like loops in TF as I think that declarative language should avoid imperative things so when I need to define <= 3 resources related to AZs I’m just using copy&paste approach.

Next we need SG for ASG, egress in such form is required to access AWS S3, ingress is required to allow incoming traffic from ALB.

 resource "aws_security_group" "asg" {
  vpc_id = "${}"
  name = "asg"
  ingress {
    from_port = 8000
    to_port = 8000
    protocol = "tcp"
    cidr_blocks = ["${aws_subnet.alb_a.cidr_block}","${aws_subnet.alb_b.cidr_block}","${aws_subnet.alb_c.cidr_block}"]
  egress {
    from_port = 443
    to_port = 443
    protocol = "tcp"
    cidr_blocks = [""]

And SG for ALB, egress is required to access our application in ASG, ingress is required to allow clients to connect to our application. We are allowing here HTTP port.

resource "aws_security_group" "alb" {
  vpc_id = "${}"
  name = "alb"
  egress {
    from_port = 8000
    to_port = 8000
    protocol = "tcp"
    cidr_blocks = ["${aws_subnet.asg_a.cidr_block}","${aws_subnet.asg_b.cidr_block}","${aws_subnet.asg_c.cidr_block}"]
  ingress {
    from_port = 80
    to_port = 80
    protocol = "tcp"
    cidr_blocks = [""]


We have AMI with application in place and created network so now it’s time to run infrastructure and expose application in Internet. To achieve that we need:

  • S3 buckets for photos
  • IAM role with policy which can be attached to EC2 instance allowing access to S3 bucket with photos
  • ASG definition with spread across AZs
  • ALB exposing UI and API

S3 bucket definition is quite obvious and very similar to that bucket with state so let’s take a look on IAM role definition:

resource "aws_iam_role_policy" "iam_role_policy" {
  name = "web_iam_role_policy"
  role = "${}"
  policy = <<EOF
  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": [":${}"]
      "Effect": "Allow",
      "Action": [
      "Resource": ["${}/*"]

We need to allow our application to:

  • list bucket to get existing photos
  • put new object in bucket to upload new photo
  • get object from bucket to serve uploaded photo

Please note asterisk in second statement – that’s quite important and it’s a little tricky as giving put/get to just bucket without asterisk will cause deny error coming from AWS.

To start EC2 instances in ASG we need launch configuration which defines attributes related to particular EC2 instance so it’s comparable to aws_instance resource with a few additional parameters. Also we need to effectively find built AMI as we don’t want to manually read created image id and hardcode it in configuration.

data "aws_ami" "goapp" {
  most_recent = true

  filter {
    name = "name"
    values = [

  filter {
    name = "virtualization-type"
    values = [

  owners = [

resource "aws_launch_configuration" "goapp" {

  lifecycle {
    create_before_destroy = true
  name_prefix = "goapp-"
  iam_instance_profile = "${}"
  image_id = "${}"
  instance_type = "t2.micro"
  associate_public_ip_address = true
  security_groups = ["${data.terraform_remote_state.network_state.asg_sg_id}"]

  root_block_device = {
    volume_type = "gp2"
    volume_size = 8
    delete_on_termination = true
  user_data = &lt;<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span><span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>&lt;EOF1
cd /opt/
./goapp ${}

Comments to lines:

1 – data block find latest AMI with proper name prefix which we can refer in other blocks

23 – as terraform documentation says: "In order to update a Launch Configuration, Terraform will destroy the existing resource and create a replacement. In order to effectively use a Launch Configuration resource with an AutoScaling Group resource, it's recommended to specify create_before_destroy in a lifecycle block."

40 – just start application in user data – in real application we should use some other mechanism like starting application as systemd service or docker container as in current design after machine reboot our app won't go up.

Line 30 need to be described deeper as we are referring there networking module by using trick with remote state.

data "terraform_remote_state" "network_state" {
  backend = "s3"
  config {
    bucket = "jakubbujny-article-form-your-cloud-1"
    region = "eu-central-1"
    key = "network"

So in Terraform you can say that you want to use output variables from some other state by specifying state bucket name and state file name what allows to divide complex environments into smaller parts.

Next step is to define ASG

 resource "aws_placement_group" "placement" {
  name = "goapp"
  strategy = "spread"
resource "aws_autoscaling_group" "goapp" {

  name = "goapp"
  max_size = 3
  min_size = 3
  placement_group = "${}"
  desired_capacity = 3
  launch_configuration = "${}"
  vpc_zone_identifier = [

  tag {
    key = "Name"
    value = "goapp"
    propagate_at_launch = true

Comments to lines:

1 – placement group allow us to say if we want to place ASG instances in cluster so they will be deployed "near" to allow fast data exchange or to use "spread" what means that ASG will be spread across AZs

Finally ALB can be attached to ASG to expose application.

resource "aws_alb" "goapp_alb" {
  name = "goapp-alb"
  security_groups = [
  internal = false
  subnets = [


resource "aws_alb_target_group" "alb_target_group" {
  name     = "goapp-alb"
  port     = "8000"
  protocol = "HTTP"
  vpc_id   = "${data.terraform_remote_state.network_state.vpc_id}"
  tags {
    name = "goapp"

  health_check {
    healthy_threshold   = 3
    unhealthy_threshold = 10
    timeout             = 5
    interval            = 10
    path                = "/"
    port                = "8000"

resource "aws_alb_listener" "alb_listener" {
  load_balancer_arn = "${aws_alb.goapp_alb.arn}"
  port              = "80"
  protocol          = "HTTP"

  default_action {
    target_group_arn = "${aws_alb_target_group.alb_target_group.arn}"
    type             = "forward"

resource "aws_autoscaling_attachment" "attachement" {
  alb_target_group_arn   = "${aws_alb_target_group.alb_target_group.arn}"
  autoscaling_group_name = "${}"

First block is quite obvious – create ALB and place in proper network subnets. Second block define target group – where we want to attach our ALB so we need to specify application protocol/port and simple healtcheck. 3rd block define listener so the part of ALB which see client coming from internet – default action is to forward requests to target group. Last bock is obvious ALB attachment.

So finally we have all part of our application and we can see the effect but…

Oh no! SSL!

Using that configuration our application is exposed on plain text HTTP what sadly is real problem because Chrome won’t allow you to give permission to particular site to use your camera if that site is not secured using SSL encryption – we need some fast workaround!

So I’m going to do it in little hacky way – please do not use that in any real application.

Simple workaround will be to use Certbot to issue SSL certificate for our application and then attach that certificate to ALB and expose HTTPS/443 connection. It’s not so easy as AWS domain are on blacklist in Certbot probably to avoid situation when hackers want to issue certificates to make some bad things like MITM attacks.

I’m going to make that using manual procedure of issuing certificate what means that Certbot generate for you filename and file content which need to be exposed on web server on domain to which you want certificate. So we need:

  • Create another S3 bucket for that Certbot file
  • Modify policy
  • Modify ALB SG to allow 443
  • Add endpoint in application to expose that file on demand
  • Add some domain to our application – the easiest way is to make CNAME in some existing domain
  • Issue certificate
  • Create HTTPS forward in ALB using that cert

Looks like a lot of work? Actually it’s really easy. Creating S3 bucket is quite obvious – we need to remember to add new S3 bucket ARN to policy to allow EC2 instances to access that bucket. We need to add 2nd parameter in application to pass bucket name and following lines of code:

func GetCertbotAuth(w http.ResponseWriter, r *http.Request) {
	vars := mux.Vars(r)
	svc := SpawnAwsSdkS3Service()
	resp, err := svc.GetObject(&s3.GetObjectInput{Bucket: aws.String(bucket_auth), Key: aws.String(vars["auth"])})
	if err != nil {
	io.Copy(w, resp.Body)

For CNAME record I just used domain on which I have blog. As that domain is parked on wordpress DNS I just added that in panel:

Screenshot from 2018-07-27 19-13-33

So that CNAME refer to ALB domain. After that operation I can issue certificate on domain by running certbot command, manually upload required file to s3 bucket and confirm in Certbot that it’s ready for verification – Certbot will connect on port 80 to search for specified file with specified content. After that I have:

Screenshot from 2018-07-27 19-12-42

So far so good. In normal application you would use Route53 domain, regular SSL certificate or some issued by Amazon for you and define everything in Terraform. But we are hacking here a little so I’m going to add HTTPS endpoint in ALB using console as it allows to define forward and paste certificate content in one form which will be uploaded to ACM:

Screenshot from 2018-07-27 19-15-51


So now finally I can access and upload my face to S3 – easy-peasy! 😀

Stream microservices logs from Kubernetes to Kafka


Application logs are very important in case of problem investigation and application health monitoring. Nowadays you want to monitor your logs and react on problems before your client will know that something is wrong. Also when your system loose stability you need to react immediately and keep your logs available and up-to-date as they are your ‘sight’ in application – let’s say that logs are first eye and metrics are second eye. In microservices world logs aggregation is even harder because you have X instances of particular microservice, every instance is printing huge number of logs to inform you what’s happening in application and those microservices are typically distributed to many different ‘nodes’ – how to deal with that?

This article describe how to use Fluentd on Kubernetes acting as Kafka producer to stream logs and how to use Fluentd on virtual machine acting as Kafka consumer to push logs to Elasticsearch. Also I want to show possibilities which give you application logs stream on Kafka by making dummy python implementation sending notifications on Slack to inform developers/devops about detected exceptions.

Why Kafka?

Streaming logs via Kafka gives you following advantages:

  1. As Kafka is extremely efficient it’s safe ‘buffer’ for your logs when your logs storage cannot consume huge amount of logs on load spikes
  2. Logs on Kafka are ready to integrate – you can attach many consumers and place logs in different storage engines or attach directly some analysis e.g. Apache Spark
  3. If you treat Kafka as integration bus for your system you can avoid ‘infrastructure spaghetti’ by making integration of different infrastructure modules using single point.

Proposed design


Fluentd on Kubernetes

Fluentd is deployed on Kubernetes using DaemonSet deployment. Please see deployment descriptor below:

apiVersion: extensions/v1beta1
kind: DaemonSet
  name: fluent
  namespace: fluent
    k8s-app: fluent
    version: v1 "true"
        k8s-app: fluent
        version: v1 "true"
      - key:
        effect: NoSchedule
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.1.3-debian-elasticsearch
        command: ["/bin/sh"]
        args: ["-c", "gem install fluent-plugin-kafka && cp /fluent-config/fluent.conf /fluentd/etc/ && /fluentd/"]
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-config
          mountPath: /fluent-config
      terminationGracePeriodSeconds: 30
      - name: varlog
          path: /var/log
      - name: varlibdockercontainers
          path: /var/lib/docker/containers
      - name: fluent-config
          name: fluent
          - key: fluent_conf
            path: fluent.conf

Please keep eye on important sections in that file:

  • Toleration means that we want to deploy fluent also on Kubernetes masters
  • Command override install kafka plugin and copy fluent.conf which we are attaching as config map

Here you can find config map:

apiVersion: v1
kind: ConfigMap
  name: fluent
  namespace: fluent
  fluent_conf: |
    @include kubernetes.conf

      @type kafka_buffered

      brokers             your-kafka-broker-host:9092

      topic_key             your-logs-topic
      default_topic         your-logs-topic
      output_data_type      json
      output_include_tag    true
      output_include_time   true
      get_kafka_client_log  false

      required_acks         1
      compression_codec     gzip

In that config we simply say to fluent: please produce all collected logs as messages on Kafka. Also we are activating gzip compression for optimization. We need at least one ack – you should set your-logs-topic replication factor to at least 2 to keep your logs when one broker is failing. 

Kafka consumer – fluentd and Elasticsearch

After streaming logs to Kafka you can easily consume them using another fluent instance – my advise here is to place that consumer on the same machine where you have Elasticsearch or if you are using Elasticsearch SaaS like AWS ES, at least in the same subnet. Here you can find config:

 @type kafka_group
 brokers your-kafka-broker-host:9092
 consumer_group fluentd
 topics your-logs-topic
 format json
 use_record_time true

 @type elasticsearch
 @id out_es
 log_level info
 include_tag_key true
 host your-elasticsearch-host
 port 443
 scheme https
 logstash_format true
 logstash_prefix kubernetes
   flush_thread_count 8
   flush_interval 5s
   chunk_limit_size 6M
   queue_limit_length 256M
   retry_max_interval 30
   retry_forever true


Please keep eye on important sections in that file:

  • We are consuming logs as fluentd consumer group – if your-logs-topic has more than one partition you can consume logs using multiple instances of consumers
  • We are simply saving logs in ‘kubernetes’ index in Elasticsearch

Kafka consumer – search & notify

The latest step is to attach consumer for log analysis. For that I’m going to modify example from confluent-kafka-python library using Python 3.6:

from confluent_kafka import Consumer, KafkaError
import json
import requests

c = Consumer({
    'bootstrap.servers': 'your-kafka-broker-host',
    '': 'logs-inspector',
    'default.topic.config': {
        'auto.offset.reset': 'smallest'


while True:
    msg = c.poll(1.0)

    if msg is None:
    if msg.error():
        if msg.error().code() == KafkaError._PARTITION_EOF:

    message_dict = json.loads(msg.value().decode('utf-8'))
    if "log" in message_dict.keys():
      if "exception" in message_dict["log"].lower():
        notification = f"Found problem in log\nContainer: {message_dict['kubernetes']['container_name']}\nNamespace: {message_dict['kubernetes']['namespace_name']} \nPod: {message_dict['kubernetes']['pod_name']} \nHost: {message_dict['kubernetes']['host']}"
          data=json.dumps({"attachments": [{
            'text': notification,
            'color': "#00FF00"
          headers={'Content-Type': 'application/json'})

That script is very simple. Consume log message from topic, check if log key exists and it contains “exception” message – if yes, send notification via slack webhook to your team members. Please note additional information which Kubernetes gives you like node, namespace and pod name.