Cloud Foundry Advisory Board Meeting - 2015 July

Foundation Update

Chip Childers was on vacation this week, so Chris Ferris from IBM gave an update from the perspective of the Foundation’s Board. There was a Board meeting the week prior that was unfortunately compromised by a NYC stock market shutdown and grounded flights. Due to this they didn’t accomplish every they wanted, but they did talk about the upcoming conferences. There will be a CFP (call for participation) going out soon. Chris said the Foundation is currently getting ready for the OSCON conference for which Cloud Foundry is a silver sponsor. For Cloud Foundry, they are planning two mini CF Summits – one is Europe (potentially Berlin in October) and one in China (potentially in Shanghai in mid-November). The final details of these will be rolling out in the next week or so, likely after OSCON is wrapped up.

CLI

Greg Oehman from VMware gave the CLI update. He said that have spent time on GitHub Pull Requests and resolving GitHub Issues as they felt they had fallen behind here. They found having two pairs of programmers reduced the number of interrupts and context switching and allowed them to have more of a continuous flow than when they only had one pair to cover everything.

The Plugin API is finished and released with v6.12.0. They are looking forward to getting feedback from plugin authors. They have a couple of interesting plugins which have been added to the plugin repository. One is from Swisscom and it gives a visualization of statistics coming from applications.

They have been working with IBM’s design teams for determining what the front-page of cf help should look like to be more informative and helpful. They have created a list of roughly a dozen help topics that they will be sharing with the community for feedback.

Diego

Eric Malm from VMware gave an update on Diego. They have run some “one hundred cell” experiments on a Diego cluster. One thing they found was that some periodic bulk operations were taking too long and timing out. For example, gathering all the information about all the long-running processes (LRPs) to construct gorouter’s routing table. They are refactoring the structure of some of their internal data-models and changing how they are serialized internally. This will reduce the amount of data the bulk operations are retrieving. A proxy server is being inserted in-front of the etcd database to help with a long-term goal of versioning and migrating data.

They have finished securing access to the etcd database within Diego using mutual SSL authentication. This includes client connections to the server cluster and for peer-to-peer communication within the cluster when it’s running with multiple nodes. This exercise has helped them understand how they might secure the communication between other Cloud Foundry components.

Eric said they are looking at options for encrypting data within etcd. They would like to be able to rotate encryption keys for this data, which is a current problem with the state of etcd.

The initial version of the cf ssh CLI plugin has been released. This allows sshing into application instances running within Diego. The last task for this ssh functionality is to determine what the correct policy is for application instances that have been accessed over ssh and altered by an interactive session. One suggestion was to automatically restart instances after the ssh session concluded to clear changes, while others would like to see greater restrictions on traffic routed to an instance during the ssh session. Eric is seeking more feedback from the community on this, which would include other options that might be considered.

Also see [cf-dev] SSH access to CF app instances on Diego

Another major milestone is that Diego has been updated to the latest Garden Linux. New features include better support for user-namespacing and dropping capabilities. This will help with running Docker images that have the USER directiveon Diego. Also included is the wshd rewrite in Go which manages the containers. They will perform another round of experiments to ensure it is performing as expected.

MEGA

Dieu Cao from VMware explained about the breaking up of the Runtime team, which had a lot of responsibilities. They have formed a new team called “MEGA” (using the name “Voltron” has been mentioned) who are currently responsible for cutting cf-releases and the older Runtime components, such as DEA, Warden, HM9000 and pull requests against integration and the “A1” environment. The A1 environment is where they deploy and test cf-release. They have also taken on the charter of separating the components out into composable releases.

Amit Gupta, the new Product Manager of MEGA team, introduced himself. He was previously on the Diego team and he started as a PM a week ago. He said the MEGA team is also formally known as the “OSS Release Integration Team”.

Amit said they are responsible for pulling things out of cf-release and into their own common releases, such as etcd-release. These would be consumable by other releases such as Loggregator and Diego. They would also help the Identity team pull out the UAA into a release that would be consumable by cf-release and BOSH.

More robust testing and integration automation around cf-release would be another goal. They do currently have the “A1” for testing, but this does not include performance, fault-tolerance or stability testing. There is nothing that will let it sit for a week and watch for memory leaks, lock-ups and thresholds being hit.

Also see : [cf-dev] Refactoring Runtime and “MEGA Mandate“

Updates from this MEGA team for the past week include having extracted etcd-release and now they are working on extracting consul-release, setting up pipelines for them, testing them and publishing them to bosh.io.

Longer-term they want to generalize the patterns that have been introduced for Diego around manifest generation.

Cornelia Davis from VMware asked Amit to confirm whether manifest generation was moving away from Spiff, to which Amit replied “yes”.

CAPI

Another spin-out of the Runtime team is the CAPI team, which is run by Dieu Cao and is responsible for the Cloud Controller. They are also taking responsibility for the Services API and these backlogs are being merged.

Dieu said that they are about to announce Arbitrary Service Parameters and Service Keys, since this work is complete. Some of this work is dependent on the availability of the CLI v6.12.1 and some is dependent on the cf-release v214.

Application Process Types work is continuing. Previously they were planning to have this work with both the DEA and Diego, but now they are planning to only have this work with Diego. They do not intend to break anything for the DEA.

They have started work on the “Dashboard per instance” epic and also the “Private brokers” epic.

Routing

Shannon Coen from VMware gave an update on Routing. He said they have been focused on Route Services and Non-HTTP traffic.

Route Services will enable a user to associate an application route with a service instance. These special service instances will proxy or transform application requests. When you associate one of these services with a route, gorouter will forward requests for that route to a URL specified for the service. After transformation or processing, the route service will forward the request back to through the load-balancer and the router to the application.

Non-HTTP traffic or “TCP routing” has been added for Diego via a new “routing” BOSH release that is now working in development and test environments. They do not have full CRUD functionality, but they have the initial mapping of external hosts with IP and ports which is provided by Diego.

TCP routing functionality has been applied to Lattice. Shannon said that they believe Lattice provides a good development environment for developers working on IoT (Internet-of-Things) applications to get feedback. Once this is in place in Lattice they will return to improving the BOSH release and the integration with Cloud Foundry and the Cloud Controller user-experience.

Buildpacks

Mike Dalessio from VMware gave an update on Buildpacks. Since there are been no objections, on Monday 20th July they will cut new versions of all the buildpacks that contain binaries specific to the Cloud Foundry stack. Until now, they have been using Heroku’s binaries that just happen to work.

Changes to the PHP buildpack will be rolled up into the beta branch in the next few days. Mike said that currently there is the “million module approach” which means all the different modules are treated as separate binaries which makes the manifest extremely large. They will be moving away from that model and will have a single PHP binary that will contain all the modules in it, so that they do not have to be dynamically loaded. Mike would like to know if anyone finds this objectionable via the cf-dev mailing list.

The rootfs build has been taken over by the Buildpacks team and the build pipe-line can be found here. That work is going well and the Runtime team has been very helpful with that.

Two new repositories have been open-sourced – binary-builder for tooling used to build the CF-specific binaries for buildpacks and buildpacks-ci for their Concourse pipelines. Mike believes this may be the largest Concourse deployment in the Universe. In the chat, this instigated a Concourse deployment size challenge from Colin Humphreys of Cloud Credo, who said they have “some pretty big Concourse pipelines” themselves.

Core Services

Marco Nicosia from VMware gave an update on Core Services, saying it has been a quiet month as they have been spending a lot of time converting their pipelines to Concourse. He said this is one of rare times he has seen developers happy to be working on pipelines and it is going well. Another reason for a quiet month is that they have been spending effort determining what it would take to offer MySQL inside of Runtime. The Cloud Controller can use MySQL as a database and this has the possibility to be highly-available. It currently uses Postgres with which this is not possible. Therefore there have been discussions between the Core Services team and the Runtime team to understand what it would take to make a smooth migration from Postgres to MySQL during an upgrade.

Marco said that the v21 release of cf-mysql-release was no good and they have now have v22, which is in pre-release acceptance. Release notes should go out within a week. The highlights of this release is that it contains updated binaries of all the MySQL components, as well as a configurable plan stub to make it easier for operators to have custom written plans. HTTPS-only mode has also been added. Documentation has been added which includes better insights of how the system scales, what it means to add a node and how to go to singleton and back up again. A small problem with AWS multiple availability zones functionality where there was a network missing in the configuration. In summary, Marco said that v22 will work a lot better than v20.

Lattice

Marco said they just released v0.2.6 of Lattice as an “earlier feedback release”. He said the Lattice team is doing a lot of work on the Condenser epic which is to break the buildpack process down into its individual components, rather than giving a CF-like experience. This allows developers to play with the parts. They have done “the first 80% and the second 80%” will be making sure all the buildpacks work properly, the quirks are understood and that it is polished to meet expectations. This will be released in v0.3.0 of Lattice. Although, based on a question from Cornelia Davis, Marco said the first half is already in v0.2.6 and with some considerable effort you could get it to sort of work with Go buildpacks.

A Windows inception recently took place. This is the next major epic that the Lattice team will be working on outside of private Docker repo automation, and is intended to give a full offering of what Windows support might look like while running on Diego.

They are collaborating with the Routing team on TCP routing, which is important to the Lattice team as it makes service discovery at lot easier inside of Lattice. Currently they have to figure out the IP addresses, which is a pain.

UAA

Sree Tummidi from VMware gave the UAA update. She said that they released UAA v2.4.0 at the beginning of the month which contains password policy support. Another feature that already existed, but was not multi-tenant, was password lock-out. This is now multi-tenant and will be in the upcoming v2.4.1 release.

Sree’s team also worked on some database housekeeping, cleaning up old codes generated during password resets or new user creation.

For multi-tenancy, Sree said they found some gaps in their implementation around how they were handling scopes and group mappings in individual UAA zones. They are working on fixing these. They are also introducing some new scopes to create a client and identity provider that will help them implement role-based access for zones.

They have started work on splitting UAA out into its own BOSH release, related to the MEGA work mentioned earlier.

Before the CAB call, the UAA team did an inception on multi-tenant SAML attributes and mapping them to OpenID claims and scopes in UAA. Work will start on this next week.

BOSH

Dmitry Kalinin from VMware gave the update for BOSH. He said the team is continuing to work on the same things they have been working on the past 1.5 months. They have finished a feature called “Trusted Certificates” that allows them to configure Director with a set of certificates that will be installed on all of the machines by default. After that they will be continuing work on compiled releases, which is close to completion. These are precompiled binaries for specific stemcell versions. This will reduce the time it takes to deploy cf-release and hopefully speed up its development.

Global Networking is currently being worked on. They are doing a lot of refactoring and adding a lot of integrations for places that have not been tested before. Once this work is merged it will allow for things like stable IPs, which do not change unless you request them to change.

The Links feature is also being worked on. This removes a lot of IPs out of the manifest. There is a working version of this and this is on the same branch as the Global Networking work. They are experimenting with converting cf-release and cf-mysql-release to use Links, but just as a learning exercise. This will help them understand any missing pieces.

An epic called Stig was recently started, which is for stemcell hardening. This is in response to questions on how stemcells are hardened and this will verify that stemcells are as secure as they can be. Most of the stories for this epic are tiny verifications for which they are adding tests. They will be going through these over the next few months.

IBM China is helping with improving AWS and OpenStack CPI functionality. Soon the CPIs will be able to decide if instance storage of EBS volumes should be used for ephemeral storage. This reduces some pain in having to configure this.

The BOSH team has been split into two teams – BOSH and BOSH CPI. The BOSH CPI are responsible for finishing up all the work for configuring the Concourse pipelines for all the CPIs and then will continue on improving CPIs.

There was a question about “the non-admin role” and whether this was near completion. Dmitry replied that they have finished the BOSH UAA integration from the BOSH side and they are waiting to announce this officially for the UAA release. As part of the MVP for BOSH-UAA integration they have two scopes that will be respected. These are “admin” scope, which allows full admin access to the whole BOSH Director and a “read” scope that allows users to view information, but not take action.

Windows

James Bayer gave an update on the Windows work VMware has been doing. He said he has personally deployed it alongside Lattice and had it running Linux containers and Windows containers running side-by-side. They are continuing to improve and harden it. It is not production ready just yet, but they are working towards it. They are looking for more people to try it out and James requested that anyone interested should contact the team. It currently works with Windows Server 2012 R2. The user just has to provide an instance on this Windows Server, run the Diego-for-Windows installer and point it at an existing Linux distribution of Diego. The Windows Server will automatically join the Linux Diego cluster as another Cell. After that you can begin pushing Windows applications.

Loggregator

Erik Jasiak from VMware gave an update on Loggregator work. He said it has been a quiet month. Additional metrics, conversions and documentation have been added. They are trying to get people to understand that they are going from a pull model of metrics from APIs to a push model of metrics that will be coming out of Loggregator. Some early deployments of nozzles were done for Datadog and the varz downstream nozzle. This was mostly for helping understand their use-cases. They are in the process of adding rate-control messages which helps users understand when their clients are not keeping up. As they were adding those, they found that if you create a nozzle that only consumes metrics and you drop all the logs then you wouldn’t see a log message. Therefore, they want to introduce new “metrics messages” and Erik will be proposing to the community what to call these. Disconnecting use-cases for clients are being looked at since they have received bug reports that clients are disconnecting for unknown reasons. This is usually due to timeouts.

Build reliability and speed is being worked on to increase the diligence of testing errors related to integration with other areas of Cloud Foundry.

The Loggregator team, based in Boulder, is currently hosting its first Cloud Foundry Dojo and they have two developers from ActiveState in their office. They will be bringing them up-to-speed and introducing them to the metrics projects they are working on as well as improved pipelines.

Abacus

Dr. Max from IBM announced a new project called Abacus which they are introducing for proposal. This is a feature from Bluemix which allows you to project and aggregate the prices and costs for any application running. It includes breakdowns for services, buildpacks and other criteria. The service broker must provide usage information as well as extracting usage information from the system. This usage information is then put into a pipeline in order to provide metering data to the end-users.

IBM would like to contribute the core part of this to the Cloud Foundry community and they have been working for the past three months on doing this. They have vetted it with colleagues from VMware. There is a v0.0.1 release available and they encourage others to take a look at it.

Dr. Max said that this project is a little different to other projects on Cloud Foundry. It uses a micro-services architecture and you can run this inside Cloud Foundry cluster as applications or outside of Cloud Foundry. Sebastien, who is the lead developer on this said he is excited to work with the community on this.

Also see [cf-dev] Incubation Proposal: Project CF-Abacus

James Bayer asked if somebody wanted to try Abacus, how much of a footprint would they need in addition to their Cloud Foundry installation. Sebastien said that it is just a set of Node.js applications and you can configure each one to run 512Mb. He said he uses 1Gb disk-space per app, but that is way too much. All the data is stored in a CouchDB database. In the demo Sebastien used a CouchDB compatible in-memory database called PouchDB, so for testing Abacus you do not even need to setup a database.

There was a question about provisioning the CouchDB database within Cloud Foundry and whether there was a CouchDB service broker available. Sebastien said they use IBM’s Cloudant (IBM’s commercial CouchDB product with a CouchDB interface), so people could sign up to use that. PouchDB could also be deployed as an application, as the CouchDB protocol supports a HTTP interface.

James Bayer said he is super excited about Abacus, because the capabilities in Cloud Foundry for usage tracking were very basic. With the existing Cloud Foundry functionality you can track duration of how long you had an application deployed, a little bit of metadata (which buildpack was used), and for services it would tell you durations about how long a service was running within a space. With Abacus you can track, meter and charge, for example, per email for an email service as long as the service is emitting the correct counter. A service could now charge per Mb instead of just for small, medium or large plans over a period of time. James said that charging by multiple dimensions would also be a possibility.

“Cloud Foundry Advisory Board Meeting – 2015 July” originally appeared on the ActiveState blog.

Cloud Foundry Advisory Board Meeting – 2015 July

Foundation Update

CLI

Diego

MEGA

CAPI

Routing

Buildpacks

Core Services

Lattice

UAA

BOSH

Windows

Loggregator

Abacus

You Might Also Like

Cloud Foundry Foundation Stands in Solidarity with Black Lives Matter

Why Diversity Matters

Cloud Foundry Foundation Seeks Contract Writer

Sign up for the Cloud Foundry Newsletter today!

Sign up for the
Cloud Foundry Newsletter today!