1 / 39

English - Kubernetes Pictures Book

Welcome

云原生社区 Kubernetes SIG

欢迎来到云原生社区 Kubernetes SIG（特别兴趣小组）！

加入 Kubernetes SIG

请填写申请表加入微信交流群。

如何提问

有关于 Kubernetes 的问题请提交 Issue 后将链接发到微信群中讨论。

Activity

Kubernetes 源码研习社

Kubernetes 源码研习社是由组织的 Kubernetes 源码特别兴趣小组（SIG），由热爱学习、注重个人成长的一帮小伙伴们自由、自愿成立的小组。每个人都非常希望从 Kubernetes 上学到知识，帮助自己实现成长和进步。欢迎加入，一起坚持，一起克服，一起成长。

源码研习社第二期（进行中）

本期主题：kube-scheduler 源码剖析

活动时间：2020.10.12 开始

如何报名：

活动介绍

Kubernetes 源码 scheduler 剖析，干就完事了。每周学习目标：

每周写笔记做总结。笔记链接：
每周六晚 7-10 点固定在线研讨 Kubernetes 调度器问题。腾讯会议号：4967324951
每日讨论 Kubernetes 源码问题
参阅本项目推荐的 kubernetes 相关文章

本期学习计划

坚持就是胜利

你能收获什么？

对 Kubernetes 核心源码有更深刻的理解
一群热爱云原生的志同道合的朋友

如何报名

进入报名 excel 表，填写自己信息即被认为是报名参加活动，每周按要求完成总结笔记，参与每周周末的讨论即可

加入我们

源码研习社也有自己的微信群，如何加入？

扫描下面的二维码，添加 Jimmy Song 好友，备注姓名-公司，留言“加入源码研习社”即可。

嘉宾介绍

郑东旭（Derek Zheng） BFE（万亿流量转发引擎）开源项目的作者之一，《Kubernetes 源码剖析》作者，擅长 Linux 下高性能服务器的开发，对云计算、区块链相关技术领域有深刻的理解。

源码研习社 SIG 小组成员

SIG 的全称是 Special Interests Group, 或称 Super Intellectual Genius。源码研习社 SIG 小组负责源码研习社活动的日常维护，目前的核心成员包括：

金润森
王文虎
赵卫国
王冬

Q&A:

Kuberenetes Design&Architecture Pictures

Introduction

Hello everyone, this time I bring you Kubernetes source code reading.

The source code design drawing needs to be used in conjunction with the source code. In addition to showing the key design, the omission of some implementation details will allow you to ask questions as you read. If you find that there are problems when you read the diagram, you can look at the source code with the problem and see how others design and solve it, which will improve your design ability. Let's Study with questions.

If you find that the picture is blurred, it is caused by the picture compression, right click and save it as the original download image or contact me (WeChat: abser9216) to send the original SVG image.

Author of the picture: Abserari，Oiar

Architecture

Container

Components

Docker Networking

Design

Sandbox: The protocol stack can contain multiple endpoints, which can be implemented by Namespace, Jail, etc.
Endpoint: Connect Sandbox with Network
Network: A collection of Endpoints that can communicate directly, which can be implemented using Bridge, VLAN, etc.

Docker Architecture

Network Controller

Initialize Network Controllers

Docker Daemon Manages the available NetWorkController. When launching Daemon, all available NetWorkController under the current operating system will be created.

On Unix Operating system: with daemon_unix.go as an example, create None, Host, Bridge network controller.

func (daemon *Daemon) initNetworkController(config *config.Config, activeSandboxes map[string]interface{}) (libnetwork.NetworkController, error) {
    netOptions, err := daemon.networkOptions(config, daemon.PluginStore, activeSandboxes)
    if err != nil {
        return nil, err
    }

    controller, err := libnetwork.New(netOptions...)
    if err != nil {
        return nil, fmt.Errorf("error obtaining controller instance: %v", err)
    }

    if len(activeSandboxes) > 0 {
        logrus.Info("There are old running containers, the network config will not take affect")
        return controller, nil
    }

    // Initialize default network on "null"
    if n, _ := controller.NetworkByName("none"); n == nil {
        if _, err := controller.NewNetwork("null", "none", "", libnetwork.NetworkOptionPersist(true)); err != nil {
            return nil, fmt.Errorf("Error creating default \"null\" network: %v", err)
        }
    }

    // Initialize default network on "host"
    if n, _ := controller.NetworkByName("host"); n == nil {
        if _, err := controller.NewNetwork("host", "host", "", libnetwork.NetworkOptionPersist(true)); err != nil {
            return nil, fmt.Errorf("Error creating default \"host\" network: %v", err)
        }
    }

    // Clear stale bridge network
    if n, err := controller.NetworkByName("bridge"); err == nil {
        if err = n.Delete(); err != nil {
            return nil, fmt.Errorf("could not delete the default bridge network: %v", err)
        }
        if len(config.NetworkConfig.DefaultAddressPools.Value()) > 0 && !daemon.configStore.LiveRestoreEnabled {
            removeDefaultBridgeInterface()
        }
    }

    if !config.DisableBridge {
        // Initialize default driver "bridge"
        if err := initBridgeDriver(controller, config); err != nil {
            return nil, err
        }
    } else {
        removeDefaultBridgeInterface()
    }

    // Set HostGatewayIP to the default bridge's IP  if it is empty
    if daemon.configStore.HostGatewayIP == nil && controller != nil {
        if n, err := controller.NetworkByName("bridge"); err == nil {
            v4Info, v6Info := n.Info().IpamInfo()
            var gateway net.IP
            if len(v4Info) > 0 {
                gateway = v4Info[0].Gateway.IP
            } else if len(v6Info) > 0 {
                gateway = v6Info[0].Gateway.IP
            }
            daemon.configStore.HostGatewayIP = gateway
        }
    }
    return controller, nil
}

NetworkController Implementation

The controller is the implementation of NetworkController in libnetwork. In this picture, the controller uses a registry map to distinguish network with different types, then use the Driver to create Network and Endpoint, attach the Endpoint to Sandbox or remove them from Sandbox.

The Container uses SandboxID and SandboxKey to find Sandbox. At the same time, Sandbox use containerID to determine which Container it belongs to.

OS Layer Sandbox

Namespace

As indicated by docker-network-sandbox.svg, not all the functions are listed, but only the functions which divide the border of Sandbox. The Namespace implementation will be an example of analytic. netlink provides the functions like route, interface. The Namespace could get netlink as below code. Attention: netlink configure only work in Namespace.

func GetFromPath(path string) (NsHandle, error) {
    fd, err := syscall.Open(path, syscall.O_RDONLY, 0)
    if err != nil {
        return -1, err
    }
    return NsHandle(fd), nil
}

GetFromPath would return NsHandle structure, then NsHandle could be used in below methods to create specific SocketHandle.

func NewHandleAt(ns netns.NsHandle, nlFamilies ...int) (*Handle, error) {

    return newHandle(ns, netns.None(), nlFamilies...)
}

// NewHandleAtFrom works as NewHandle but allows client to specify the
// new and the origin netns Handle.
func NewHandleAtFrom(newNs, curNs netns.NsHandle) (*Handle, error) {
    return newHandle(newNs, curNs)
}

func newHandle(newNs, curNs netns.NsHandle, nlFamilies ...int) (*Handle, error) {
    h := &Handle{sockets: map[int]*nl.SocketHandle{}}
    fams := nl.SupportedNlFamilies
    if len(nlFamilies) != 0 {
        fams = nlFamilies
    }
    for _, f := range fams {
        s, err := nl.GetNetlinkSocketAt(newNs, curNs, f)
        if err != nil {
            return nil, err
        }
        h.sockets[f] = &nl.SocketHandle{Socket: s}
    }
    return h, nil
}

Add Interface

Bridge Network

Create Network

According to the BridgeName value of the config file(networkConfiguration), attempt to find an existing bridge named with the specified name. If not, use the default Bridge -- docker0.

func newInterface(nlh *netlink.Handle, config *networkConfiguration) (*bridgeInterface, error) {
    var err error
    i := &bridgeInterface{nlh: nlh}

    // Initialize the bridge name to the default if unspecified.
    if config.BridgeName == "" {
        config.BridgeName = DefaultBridgeName
    }

    // Attempt to find an existing bridge named with the specified name.
    i.Link, err = nlh.LinkByName(config.BridgeName)
    if err != nil {
        logrus.Debugf("Did not find any interface with name %s: %v", config.BridgeName, err)
    } else if _, ok := i.Link.(*netlink.Bridge); !ok {
        return nil, fmt.Errorf("existing interface %s is not a bridge", i.Link.Attrs().Name)
    }
    return i, nil
}

Create and set network handler in driver

// Create and set network handler in driver
network := &bridgeNetwork{
    id:         config.ID,
    endpoints:  make(map[string]*bridgeEndpoint),
    config:     config,
    portMapper: portmapper.New(d.config.UserlandProxyPath),
    bridge:     bridgeIface,
    driver:     d,
}

d.Lock()
d.networks[config.ID] = network
d.Unlock()

If bridgeInterface exists the valid bridge device, the device and sysctl methods would be added to the queue; if already exists, just add sysctl methods.

bridgeAlreadyExists := bridgeIface.exists()
if !bridgeAlreadyExists {
    bridgeSetup.queueStep(setupDevice)
    bridgeSetup.queueStep(setupDefaultSysctl)
}

// For the default bridge, set expected sysctls
if config.DefaultBridge {
    bridgeSetup.queueStep(setupDefaultSysctl)
}

Add a corresponding setting method to the setting queue according to the configuration file parameters.

for _, step := range []struct {
        Condition bool
        Fn        setupStep
}{
    // Enable IPv6 on the bridge if required. We do this even for a
    // previously  existing bridge, as it may be here from a previous
    // installation where IPv6 wasn't supported yet and needs to be
    // assigned an IPv6 link-local address.
    {config.EnableIPv6, setupBridgeIPv6},

    // We ensure that the bridge has the expectedIPv4 and IPv6 addresses in
    // the case of a previously existing device.
    {bridgeAlreadyExists && !config.InhibitIPv4, setupVerifyAndReconcile},

    // Enable IPv6 Forwarding
    {enableIPv6Forwarding, setupIPv6Forwarding},

    // Setup Loopback Addresses Routing
    {!d.config.EnableUserlandProxy, setupLoopbackAddressesRouting},

    // Setup IPTables.
    {d.config.EnableIPTables, network.setupIPTables},

    //We want to track firewalld configuration so that
    //if it is started/reloaded, the rules can be applied correctly
    {d.config.EnableIPTables, network.setupFirewalld},

    // Setup DefaultGatewayIPv4
    {config.DefaultGatewayIPv4 != nil, setupGatewayIPv4},

    // Setup DefaultGatewayIPv6
    {config.DefaultGatewayIPv6 != nil, setupGatewayIPv6},

    // Add inter-network communication rules.
    {d.config.EnableIPTables, setupNetworkIsolationRules},

    //Configure bridge networking filtering if ICC is off and IP tables are enabled
    {!config.EnableICC && d.config.EnableIPTables, setupBridgeNetFiltering},
} {
    if step.Condition {
        bridgeSetup.queueStep(step.Fn)
    }
}

Add the device start setting method to the setup queue and return to the execution result.

bridgeSetup.queueStep(setupDeviceUp)
return bridgeSetup.apply()

Setup Device

Create a NetLink.bridge structure, using BridGename in the configuration to create LinkAttrs, then create a bridge device using the NetLink method. If need to set MAC, randomly generates the MAC address.

func setupDevice(config *networkConfiguration, i *bridgeInterface) error {
    var setMac bool

    // We only attempt to create the bridge when the requested device name is
    // the default one.
    if config.BridgeName != DefaultBridgeName && config.DefaultBridge {
        return NonDefaultBridgeExistError(config.BridgeName)
    }

    // Set the bridgeInterface netlink.Bridge.
    i.Link = &netlink.Bridge{
        LinkAttrs: netlink.LinkAttrs{
            Name: config.BridgeName,
        },
    }

    // Only set the bridge's MAC address if the kernel version is > 3.3, as it
    // was not supported before that.
    kv, err := kernel.GetKernelVersion()
    if err != nil {
        logrus.Errorf("Failed to check kernel versions: %v. Will not assign a MAC address to the bridge interface", err)
    } else {
        setMac = kv.Kernel > 3 || (kv.Kernel == 3 && kv.Major >= 3)
    }

    if setMac {
        hwAddr := netutils.GenerateRandomMAC()
        i.Link.Attrs().HardwareAddr = hwAddr
        logrus.Debugf("Setting bridge mac address to %s", hwAddr)
    }

    if err = i.nlh.LinkAdd(i.Link); err != nil {
        logrus.Debugf("Failed to create bridge %s via netlink. Trying ioctl", config.BridgeName)
        return ioctlCreateBridge(config.BridgeName, setMac)
    }

    return err
}

So netlink could create and configure Bridge devices.

Networking Configuration

System Control
- /proc/sys/net/ipv6/conf/BridgeName/accept_ra -> 0：Routing suggestions are not accepted
- /proc/sys/net/ipv4/conf/BridgeName/route_localnet -> 1：Redirect external traffic to loopback, need to be used with iptables

IPTABLES

INTERNAL
- filter
  - DOCKER-ISOLATION-STAGE-1 -i BridgeInterface ! -d Network -j DROP
  - DOCKER-ISOLATION-STAGE-1 -o BridgeInterface ! -s Network -j DROP
NON INTERNAL
- nat
  - DOCKER -t nat -i BridgeInterface -j RETURN
- filter
  - FORWARD -i BridgeInterface ! -o BridgeInterface -j ACCEPT
- HOST IP != nil
  - nat
    POSTROUTING -t nat -s BridgeSubnet ! -o BridgeInterface -j SNAT --to-source HOSTIP
    POSTROUTING -t nat -m addrtype --src-type LOCAL -o BridgeInterface -j SNAT --to-source HOSTIP
- HOST IP == nil
  - nat
    POSTROUTING -t nat -s BridgeSubnet ! -o BridgeInterface -j MASQUERADE
    POSTROUTING -t nat -m addrtype --src-type LOCAL -o BridgeInterface -j MASQUERADE
- Inter Container Communication Enabled
  - filter
    FORWARD -i BridgeInterface -o __BridgeInterface -j ACCEPT
- Inter Container Communication Disabled
  - filter
    FORWARD -i BridgeInterface -o __BridgeInterface -j DROP
- nat
  - PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
  - OUTPUT -m addrtype --dst-type LOCAL -j DOCKER
- filter
  - FORWARD -o BridgeInterface -j DOCKER
  - FORWARD -o BridgeInterface -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
- filter
  - -I FORWARD -j DOCKER-ISOLATION-STAGE-1

Create Endpoint

There is a default Bridge device docker0 globally, and each Container has its own independent network protocol stack. Container network and Bridge device communication via Veth pairs Different Container on the same node, 3-layer communication can be performed through the ARP protocol; when the Container traffic out of the Node network, should be redirected by the default gateway device Docker0, and then redirect to Eth0.

Common Tools Go Routine Tools

Parallelize

Stop Channel Mode

Group

Group enhances the ability to sync.Group in the Go standard library. It's used to execute the method to control the termination condition through Context or channel and execute the method in a stand-alone Go Routine.

Forever & Until

JitterUntil

From the comments in the code, we need to solve two key problems, how the sliding works and how the jitterFactor works. Once both of them are solved, everything will be clear.

Sliding

From the code, it's not difficult to see that the sliding is whether the time interval contains execution time. Looking at the 170 lines of code, it's not difficult to guess that Backoff() returns a Timer type, then the Timer start time is the key. One detail needs to be noted in this code, the select starting at line 167 doesn't guarantee order, in other words, if the timer has been triggered and the stopCh has been closed, it isn't necessary to ensure exit. But after entering the lower wheel cycle, due to the 144th line code, it must ensure that the program is normal to exit.

JitterFactor

The jitterFactor works rely on the BackoffManager, let's look at the creation process, the configuration parameters and other associated objects are simply preserved.

Let's look at the implementation of its backoff method again. Pay attention to lines 379-383 to ensure that only one timer is working.

Continue to see the Jitter implementation, add a dynamic value on a fixed duration, and the two problems are solved.

Common Tools Async

Runner

Landscape

References

async.Runner

Common Tools Time Budget

Definition

Creation

The timeBudget is defined as follows.

The creation code is as follows.

Process

Bonus

After the timeBudget starts running, a work collaboration is created. In this coroutine, the budget increase operation will be triggered once a second. If the budget is greater than the upper limit, the upper limit is taken.

Take & Return

The takeAvailable acquires all budgets at a time and resets the budget. The returnUnused returns the remaining budget.

Common Tools Widgets

Buffer

Ring Buffer

Core Scheme

Definition

Relationships

ObjectConvertor

New

AddKnownTypes

Type Default Function

SchemeBuilder

Core The Basics

Object

Overview

The Object instances are as follows. They are basically in the pkg/apis directory, you can find it yourself.

Unstructured

The scene of Unstructured and Object cooperation is as follows.

The example diagram is as follows.

Scheme

Builder

Scheme

AddKnownTypes

AddUnversionedTypes

nameFunc

Others

Conversion

Landscape

Core Function Definitions

The DefaultNameFunc implementation is as follows.

The ConversionFunc declaration is as follows.

FieldMappingFunc converts the key to Field in the source structure and the target structure.

ConversionFuncs

Converter

Briefly explain the following methods:

doConversion

Scope

defaultConvert

Then process them separately according to dv.Kind().

dv.Kind() -> reflect.Struct

dv.Kind() -> reflect.Slice

dv.Kind() -> reflect.Ptr

dv.Kind() -> reflect.Interface

dv.Kind() -> reflect.Map

Core Cacheable Object

Overview

Know the information from the name of CacheableObject. It could store the Object instance. In the 1.18 version of Kubernetes, cachingObject is the only implementation of this interface - CacheableObject. Its relation as this picture.

Combined with CacheableObject definitions, it will be found that when the CacheableObject is stored in Object, Identifier is specified, whether does it mean to save(cache) multiple Object?

The GetObject() method is indeed returned to an Object instance, although the container class structure of Slice, Map cannot be eliminated to implement the possibility of Object interface, it is also a problem that needs to be deeply understood and resolved.

Let us continue and start by trying to solve these two questions and see what can have any gains.

cachingObject

Overview

Each cachingObject actually stores an Object, which is the metaRuntimeInterface instance. According to different Identifiers, this object is encoded into different formats and is cached in the map.

Implementation

Definition

metaRuntimeInterface simultaneously implemented runtime.Object interface and metav1.Object interface.

metav1.Object interface as this defined is using to describe Kuberentes core Object.

Creation

From the new method, we can see that a cachingObject stores an instance, and when it stores an instance, it does not store the original object, but a deep copy

GetObject method Gets a deep copy of the metaRuntimeInterface instance.

Implementation for runtime.Object

CachingObject itself also implement the runtime.Object interface, implemented as follows. It is important to note that in the DeepCopyObject() method, a new SerializationCache is created, and the old content is not copied.

Implementation For CacheableObject

The focus is to replace the atomic.Value , the operation as shown below.

Core Codec

Overview

API Server Routes

本文研究了 Route 部分的源码，配备源码进行进一步理解，可以加深理解,增强相关设计能力。

This paper studies the source code of the Route section. You should read the source code at the same time. It can enhance your design capacity.

APIServerHandler

Overview

The figure below shows the APIServerHandlercore assembly. It is mainly divided into Restful and NonRestful two parts. - Restful is prioritized, and if the processing is successful, exit. does not execute the NonRestful section; - if the RESTful section does not have a target function, the NonRestful section is executed. FullHandlerChain is used for HTTP processing entry points, linking the middleware features, and boots the request to the Director for processing.

The following is the APIServerdefault HandlerChain build process.

API Server API Group

This paper studies the source code of the API Group section. You should read the source code at the same time. It can enhance your design capacity.

Resource Management

Storage

VersionedResourcesStorageMap saves the mapping of version->resources->rest.Storage, the first-level mapping is the version, the second-level is the resource, and the storage is used to solve the creation, modification, and deletion of resource objects.

Install on API Server

getResourceNamesForGroup

APIGroupVersion

Install APIGroup

InstallREST

APIInstaller

Install

registerResourceHandlers

Convert the rest.Storage interface to various operation interfaces, the code is shown below. It can be seen from this that the rest.Storage interface is the key, and we will discuss it in depth later.

Take creater as an example. Finally, register creater or namedCreater on the Post method.

Discovery

In the registration code, we can see that when registering the API, available Resources and restful.WebService is returned. Afterward, immediately register the Resources available to the WebService on the root request of the WebService, and the action is GET.

API Server Storage

This paper studies the source code of the Storage section. You should read the source code at the same time. It can enhance your design capacity.

StorageFactory

The role of StorageFactory is to encapsulate and simplify operations on resources. The main function of StorageFactory is to obtain the storage configuration Config corresponding to the resource according to the incoming GroupResource.

In the API Server, StorageFactory is generated by StorageFactoryConfig, and StorageFactoryConfig is generated by EtcdOption. After all, no matter what changes, etcd storage is the final destination.

DefaultStorageFactory

DefaultStorageFactory is the only implementation of K8S internal StorageFactory before version 1.18. Let's analyze the mode of DefaultStorageFactory in detail.

Cohabitating Resources

The DefaultStorageFactory organizes the associated GroupResources together. As you can see from the above figure, each incoming GroupResource is processed in turn. Therefore, there are also priority issues among the associated GroupResources. The following figure shows the configuration of associated resources in the StorageFactory used when kube-apiserver is created.

RESTOptionsGetter

Etcd configuration and StorageFactory are finally imported into RESTOptionsGetter. RESTOptionsGetter is used as the core configuration item to find the final storage through GroupResource.

The process of creating storage. The interface is shown in the figure below.

StorageFactoryRestOptionFactory

Taking StorageFactoryRestOptionFactory as an example, the steps of the GetRESTOptions method are as follows.

Use StorageFactory to generate Storage Config.
Create a RESTOptions structure and save the generated Storage Config.
Use generic.UndecoratedStorage method as a decorator by default.
If the EnableWatchCache option is turned on, the Decorator will be modified.

UndecoratedStorage

UndecoratedStorage only uses the passed storagebackend.Config parameter

Call factory.Create directly to create the back-end storage.

API Server Cacher

This paper studies the source code of the Storage section. You should read the source code at the same time. It can enhance your design capacity.

Overview

Cacher contains an instance of storage.Interface, which is a real storage backend instance. At the same time, Cacher also implements storage.Interface interface, which is a typical decorator pattern. There are a large number of elegant design patterns in the Kubernetes source code, so you can pay more attention when reading. After simply tracking the code, the current guessed relationship is as follows.

The registry package location is as follows.

The storage package location is as follows.

‌Store initialization code set DryRunnableStorage location.

Store

Interface Definition

The Store interface is defined in k8s.io/client-go. Pay attention to the Add/Update/Delete in the interface, which is used to add objects to the Store. Then the role of this interface is the glue between API Server and Etcd.

Event Main Cycle

The Cacher structure is defined as follows, which contains a watchCache instance.

Look at the Cacher initialization method again. Line 373 is used to create a watchCache instance. The EventHandler passed in is a method of Cacher. In this way, watchCache has a channel for injecting events into Cacher.

The dispatchEvents method in the above code seems to be the part that processes the Event sent from the watchCache method. Let's continue, it seems that we are about to solve the event source problem.

Keep track of incoming, so does processEvent seem familiar?

Go to the watchCache structure and find the place where eventHandler is used.

Continue to dig, so far, we have found the complete source of the event, and there are only three types of events: Add/Update/Delete.

watchCache.processEvent

The generation of the original event to the final event is shown in the figure below. The keyFunc, getAttrsFunc, Indexer, etc. used are all passed in through configuration.

After the event created, refresh the cache.

Event Generation

Cache Watcher

The related structure of cacheWatcher in Cacher is shown in the figure below.

The cacheWatcher implements the watch.Interface interface for monitoring events. The watch.Interface declaration is as follows.

The definition of watch.Event is as follows.

The core processing flow of cacheWatcher is as follows.

Watch

Cacher

The judgment processes of triggerValue and triggerSupported are as follows.

CacheWatcher's Input Channel cache size calculation is as follows.

The specific addition code is as follows

The forgetWatcher is as follows. clean watcher from Cacher.

Event Dispatching

Bookmark Event

In the Cacher event distribution process, a Timer is created. Each time this Timer is triggered, it is possible to generate a Bookmark Event event and distribute this event. The source code is as follows.

Dispatch

After the Bookmark Event is created, the ResourceVersion information of the event object is updated through Versioner, and then the event is distributed. Next, let's take a look at how to distribute.

The Bookmark Event distribution process is shown in the following figure. You can see that the event has been distributed to all cacheWatchers whose IDs are less than the current time.

After arriving at CacheWatcher, the processing is very simple, just returns the original object.

General Event

Dispatch

As you can see from the figure above, when the length of watchersBuffer is greater than or equal to 3, the object is cached for sending. When sending an event, if there is a failure, get an available time slice, within this time slice, try to block sending the event. If all the transmissions are successful, the waiting time slice is exhausted.

If sending fails within the time slice, delete the remaining cacheWatcher.

References

API Server Etcd

This article has studied the source code of the ETCD part, equipped with the source code for further understanding, which can deepen the understanding and enhance related design capabilities.

Hello everyone, this time I bring you ETCD source code reading. The three parts of this article are the Server part, the Storage part, and the Utility part. With the source code for further understanding, you can deepen your understanding and enhance related design capabilities.

Server

Single

Landscape

Clients contain the address to be monitored by the etcd server. The address can be in the form of TCP or Unix Socket and supports http and https. The serverCtx matches a net.Listener and runs independently of a goroutine.

Serve Procedure

Backend

Landscape

Storage

BoltDB Backend

Landscape

run

Start Timer regularly submits or submit and exit when receiving the stop signal. The code is simple, as shown below.

func (b *backend) run() {
    defer close(b.donec)
    t := time.NewTimer(b.batchInterval)
    defer t.Stop()
    for {
        select {
        case <-t.C:
        case <-b.stopc:
            b.batchTx.CommitAndStop()
            return
        }
        b.batchTx.Commit()
        t.Reset(b.batchInterval)
    }
}

Transaction Relationship

Buffer

MVCC

Store

References

Watchable

Landscape

Watcher Creation

Nofity Waiter

Utility

CMux

soheilhy/cmux: Connection multiplexer for GoLang: serve different services on the same port!

Landscape

Implementation

Scheduler

Landscape

WAL

Landscape

API Server Generic API Server

This article has studied the source code of the Generic API Server part, equipped with the source code for further understanding, which can deepen the understanding and enhance related design capabilities.

Delegation Chain

Overview

Server Chain

HTTP Server

Handler Chain

The type of HandlerChainBuilderFn is defined as follows. Pass in an http.Handler instance and return an http.Handler instance. In this way, a middleware-like effect can be achieved.

When creating the ApiServerHandler, use the following method.

Start

The final startup code of preparedAPIAggregator is as follows. It simply calls the Run method of runnable. In Server Chain, we know that runnable is an instance of preparedGenericAPIServer generated by GenericAPIServer included in APIAggregator.

The Run method of PreparedGenericapiServer is as follows.

API Server CustomResourceDefinitions

This article has studied the source code of the CRD part, equipped with the source code for further understanding, which can deepen the understanding and enhance related design capabilities.

ResourceConfig

Default Configuration

Enabled resource configuration and disabled version.

Extend

Enabled the selection as follows.

Runtime Support

The three are shown below.

Storage

Custom Resource Definitions

The Store is expanded as shown below.

State Transition

Landscape

SharedInformerFactory is used to create SharedIndexInformer, which will periodically use Clientset to connect to the API Extension Services of v1beta1 or v1 and notify the respective ResourceEventHandler after obtaining the status change. Here, there are still some issues that need to dig deeper:

How SharedInformerFactory distinguishes different types of resource state changes
Can ResourceEventHandler pay attention to changes in the state of different types of resources at the same time
How are resource status changes obtained

Clientset

The Clientset function is relatively simple. It encapsulates the available API Extension Services. Each RESTClient is connected to the "Loopback" address and sends requests to different services.

SharedInformerFactory

Relationship

Add Informers

Management

EstablishingController

After the EstablishingController is started, it will start a scheduled execution task. This task checks every second whether there is a new Key value in the queue. If there is, update the corresponding resource status on the Server side to Established.

The sync code is as follows.

CRD Handler

The CRD Handler registers event processing with SharedIndexInformer. When the Watch object type is Update, it may be that the state changes to the Established state and needs to be sent to the EstablingController.

When the CRD Handler processes the request, it first checks whether the cache contains the requested object, if so, returns the cached object; if not, it requests the Server and changes the cache status.

CRD Controller

API Server Master Server

This article has studied the source code of the Master Server part, equipped with the source code for further understanding, which can deepen the understanding and enhance related design capabilities.

Server Handler

Serve HTTP Procedure

Add Route

Resource Handler

Install Legacy Resource

First, determine whether the resource configuration of the v1 version is enabled. If enabled, the corresponding resource processing API will be installed. Note that the two core components, StorageFactory and RESTOptionsGetter, have been explained in more detail before.

Create a LegacyRESTStorageProvider object, save the StorageFactory and other necessary information, and then pass in the method InstallLegacyAPI, along with RESTOptionsGetter.

InstallLegacyAPI uses the passed parameters to create APIGroupInfo and install it.

NewLegacyRESTStorage

Create APIGroupInfo.

Create various types of RESTStorage, not all of them are listed in the figure below.

Build resources for Storage mapping.

An associate resource to Storage mapping on version v1.

REST

Each resource type has its own REST package. Generally speaking, REST only needs to simply encapsulate a Store. When creating, it will register NewFunc, NewListFunc, and behavior strategies that match the resource type.

Note that REST is not necessarily only one Store, such as Posstorage.

RESTStorageProvider

RESTStorageProvider cooperates with Resource Config and REST Options to create APIGroupInfo, which is used to register resource processing methods with API Server.

RESTOptionsGetter registers the Store with the Storage Map according to the version and resource check method in APIResourceConfigSource, and finally mounts the Storage Map to APIGroupInfo. Take Auto Scaling as an example, the code is as follows.

The code to create the v1 version of Storage is as follows, and the other parts are similar.

It is not difficult to see that RESTStorageProvider is the core component that undertakes configuration to API Group. Such a design can clearly divide the boundaries of each structure and interface, and set a reasonable process.

Cluster Authentication

Controller Runner

The Listener has only one Enqueue method and is registered somewhere through Notifier. ControllerRunner controls the execution of a task. If it is necessary to notify the outside during the execution process, it will broadcast (or unicast) to the target task queue through the registered Listener list. The queue owner may be a task waiting for the queue to output.

Through this design, use the queue feature to isolate the two related tasks and divide their boundaries. The Enqueue method of the Listener interface has no parameters. Therefore, the implementation of the Listener focuses more on the occurrence of the event rather than the specific details of the event content. This idea is worth learning.

Dynamic CA

PKI certificate and requirements

API Server Aggregator Server

本文研究了 Aggregator Server 部分的源码，配备源码进行进一步理解，可以加深理解,增强相关设计能力。

Service Registration

Workflow

通过 Informer 监控 APIService 资源变更，通过 ResourceEventHandler 放入 Controller 队列。Controller 内部处理逻辑与其他 Controller 一致，最终将 APIService 资源变更情况，反映至 Aggregator Server 的 HTTP 处理部分。

Available Condition Controller

Rebuild Service Cache

监听的是 APIService 资源变更
无论是 Add/Update/Delete，重建 cache 方法一致，使用的是从 API Server 获取的服务列表

Change Condition

AvailableConditionController 的运行协程从 queue 中取出内容，并检查该服务状态后，将服务当前上报至 API Server。

Client Shared Informer

本文研究了 Kubernetes 中 Client Shared Informer 部分的源码，配备源码进行进一步理解，可以加深理解,增强相关设计能力。

Workflow

从接口间关系可以看出，SharedInformer 是核心组件，它通过 Controller 执行操作，并将结果存入 Store 中。SharedIndexInformer 为 SharedInformer 添加了 Index 功能。

Procedure

Run

Add Handler

ListAndWatch

Indexer

[1] cache 根据 Object 生成 Key 的方式如下

[2] items 根据 Key 获取老对象，并设置新对象

[3] updateIndices 代码如下

[4] sharedIndexInformer 在创建 processorListener 时，如果处于工作状态，会调用 indexer 的 List 方法将全部缓存的 object 取出，并发送给新添加的 processorListener。

最终获取全部事件对象位置

本文研究了 Kubernetes 中 Client Shared Informer 部分的源码，是 Client 篇的第一部分，下面是全系列的链接。

Docker Networking

Design

Sandbox: The protocol stack can contain multiple endpoints, which can be implemented by Namespace, Jail, etc.
Endpoint: Connect Sandbox with Network
Network: A collection of Endpoints that can communicate directly, which can be implemented using Bridge, VLAN, etc.

Docker Architecture

Network Controller

Initialize Network Controllers

Docker Daemon Manages the available NetWorkController. When launching Daemon, all available NetWorkController under the current operating system will be created.

On Unix Operating system: with daemon_unix.go as an example, create None, Host, Bridge network controller.

func (daemon *Daemon) initNetworkController(config *config.Config, activeSandboxes map[string]interface{}) (libnetwork.NetworkController, error) {
    netOptions, err := daemon.networkOptions(config, daemon.PluginStore, activeSandboxes)
    if err != nil {
        return nil, err
    }

    controller, err := libnetwork.New(netOptions...)
    if err != nil {
        return nil, fmt.Errorf("error obtaining controller instance: %v", err)
    }

    if len(activeSandboxes) > 0 {
        logrus.Info("There are old running containers, the network config will not take affect")
        return controller, nil
    }

    // Initialize default network on "null"
    if n, _ := controller.NetworkByName("none"); n == nil {
        if _, err := controller.NewNetwork("null", "none", "", libnetwork.NetworkOptionPersist(true)); err != nil {
            return nil, fmt.Errorf("Error creating default \"null\" network: %v", err)
        }
    }

    // Initialize default network on "host"
    if n, _ := controller.NetworkByName("host"); n == nil {
        if _, err := controller.NewNetwork("host", "host", "", libnetwork.NetworkOptionPersist(true)); err != nil {
            return nil, fmt.Errorf("Error creating default \"host\" network: %v", err)
        }
    }

    // Clear stale bridge network
    if n, err := controller.NetworkByName("bridge"); err == nil {
        if err = n.Delete(); err != nil {
            return nil, fmt.Errorf("could not delete the default bridge network: %v", err)
        }
        if len(config.NetworkConfig.DefaultAddressPools.Value()) > 0 && !daemon.configStore.LiveRestoreEnabled {
            removeDefaultBridgeInterface()
        }
    }

    if !config.DisableBridge {
        // Initialize default driver "bridge"
        if err := initBridgeDriver(controller, config); err != nil {
            return nil, err
        }
    } else {
        removeDefaultBridgeInterface()
    }

    // Set HostGatewayIP to the default bridge's IP  if it is empty
    if daemon.configStore.HostGatewayIP == nil && controller != nil {
        if n, err := controller.NetworkByName("bridge"); err == nil {
            v4Info, v6Info := n.Info().IpamInfo()
            var gateway net.IP
            if len(v4Info) > 0 {
                gateway = v4Info[0].Gateway.IP
            } else if len(v6Info) > 0 {
                gateway = v6Info[0].Gateway.IP
            }
            daemon.configStore.HostGatewayIP = gateway
        }
    }
    return controller, nil
}

NetworkController Implementation

The Container uses SandboxID and SandboxKey to find Sandbox. At the same time, Sandbox use containerID to determine which Container it belongs to.

OS Layer Sandbox

Namespace

func GetFromPath(path string) (NsHandle, error) {
    fd, err := syscall.Open(path, syscall.O_RDONLY, 0)
    if err != nil {
        return -1, err
    }
    return NsHandle(fd), nil
}

GetFromPath would return NsHandle structure, then NsHandle could be used in below methods to create specific SocketHandle.

func NewHandleAt(ns netns.NsHandle, nlFamilies ...int) (*Handle, error) {

    return newHandle(ns, netns.None(), nlFamilies...)
}

// NewHandleAtFrom works as NewHandle but allows client to specify the
// new and the origin netns Handle.
func NewHandleAtFrom(newNs, curNs netns.NsHandle) (*Handle, error) {
    return newHandle(newNs, curNs)
}

func newHandle(newNs, curNs netns.NsHandle, nlFamilies ...int) (*Handle, error) {
    h := &Handle{sockets: map[int]*nl.SocketHandle{}}
    fams := nl.SupportedNlFamilies
    if len(nlFamilies) != 0 {
        fams = nlFamilies
    }
    for _, f := range fams {
        s, err := nl.GetNetlinkSocketAt(newNs, curNs, f)
        if err != nil {
            return nil, err
        }
        h.sockets[f] = &nl.SocketHandle{Socket: s}
    }
    return h, nil
}

Add Interface

Bridge Network

Create Network

According to the BridgeName value of the config file(networkConfiguration), attempt to find an existing bridge named with the specified name. If not, use the default Bridge -- docker0.

func newInterface(nlh *netlink.Handle, config *networkConfiguration) (*bridgeInterface, error) {
    var err error
    i := &bridgeInterface{nlh: nlh}

    // Initialize the bridge name to the default if unspecified.
    if config.BridgeName == "" {
        config.BridgeName = DefaultBridgeName
    }

    // Attempt to find an existing bridge named with the specified name.
    i.Link, err = nlh.LinkByName(config.BridgeName)
    if err != nil {
        logrus.Debugf("Did not find any interface with name %s: %v", config.BridgeName, err)
    } else if _, ok := i.Link.(*netlink.Bridge); !ok {
        return nil, fmt.Errorf("existing interface %s is not a bridge", i.Link.Attrs().Name)
    }
    return i, nil
}

Create and set network handler in driver

// Create and set network handler in driver
network := &bridgeNetwork{
    id:         config.ID,
    endpoints:  make(map[string]*bridgeEndpoint),
    config:     config,
    portMapper: portmapper.New(d.config.UserlandProxyPath),
    bridge:     bridgeIface,
    driver:     d,
}

d.Lock()
d.networks[config.ID] = network
d.Unlock()

If bridgeInterface exists the valid bridge device, the device and sysctl methods would be added to the queue; if already exists, just add sysctl methods.

bridgeAlreadyExists := bridgeIface.exists()
if !bridgeAlreadyExists {
    bridgeSetup.queueStep(setupDevice)
    bridgeSetup.queueStep(setupDefaultSysctl)
}

// For the default bridge, set expected sysctls
if config.DefaultBridge {
    bridgeSetup.queueStep(setupDefaultSysctl)
}

Add a corresponding setting method to the setting queue according to the configuration file parameters.

for _, step := range []struct {
        Condition bool
        Fn        setupStep
}{
    // Enable IPv6 on the bridge if required. We do this even for a
    // previously  existing bridge, as it may be here from a previous
    // installation where IPv6 wasn't supported yet and needs to be
    // assigned an IPv6 link-local address.
    {config.EnableIPv6, setupBridgeIPv6},

    // We ensure that the bridge has the expectedIPv4 and IPv6 addresses in
    // the case of a previously existing device.
    {bridgeAlreadyExists && !config.InhibitIPv4, setupVerifyAndReconcile},

    // Enable IPv6 Forwarding
    {enableIPv6Forwarding, setupIPv6Forwarding},

    // Setup Loopback Addresses Routing
    {!d.config.EnableUserlandProxy, setupLoopbackAddressesRouting},

    // Setup IPTables.
    {d.config.EnableIPTables, network.setupIPTables},

    //We want to track firewalld configuration so that
    //if it is started/reloaded, the rules can be applied correctly
    {d.config.EnableIPTables, network.setupFirewalld},

    // Setup DefaultGatewayIPv4
    {config.DefaultGatewayIPv4 != nil, setupGatewayIPv4},

    // Setup DefaultGatewayIPv6
    {config.DefaultGatewayIPv6 != nil, setupGatewayIPv6},

    // Add inter-network communication rules.
    {d.config.EnableIPTables, setupNetworkIsolationRules},

    //Configure bridge networking filtering if ICC is off and IP tables are enabled
    {!config.EnableICC && d.config.EnableIPTables, setupBridgeNetFiltering},
} {
    if step.Condition {
        bridgeSetup.queueStep(step.Fn)
    }
}

Add the device start setting method to the setup queue and return to the execution result.

bridgeSetup.queueStep(setupDeviceUp)
return bridgeSetup.apply()

Setup Device

func setupDevice(config *networkConfiguration, i *bridgeInterface) error {
    var setMac bool

    // We only attempt to create the bridge when the requested device name is
    // the default one.
    if config.BridgeName != DefaultBridgeName && config.DefaultBridge {
        return NonDefaultBridgeExistError(config.BridgeName)
    }

    // Set the bridgeInterface netlink.Bridge.
    i.Link = &netlink.Bridge{
        LinkAttrs: netlink.LinkAttrs{
            Name: config.BridgeName,
        },
    }

    // Only set the bridge's MAC address if the kernel version is > 3.3, as it
    // was not supported before that.
    kv, err := kernel.GetKernelVersion()
    if err != nil {
        logrus.Errorf("Failed to check kernel versions: %v. Will not assign a MAC address to the bridge interface", err)
    } else {
        setMac = kv.Kernel > 3 || (kv.Kernel == 3 && kv.Major >= 3)
    }

    if setMac {
        hwAddr := netutils.GenerateRandomMAC()
        i.Link.Attrs().HardwareAddr = hwAddr
        logrus.Debugf("Setting bridge mac address to %s", hwAddr)
    }

    if err = i.nlh.LinkAdd(i.Link); err != nil {
        logrus.Debugf("Failed to create bridge %s via netlink. Trying ioctl", config.BridgeName)
        return ioctlCreateBridge(config.BridgeName, setMac)
    }

    return err
}

So netlink could create and configure Bridge devices.

Networking Configuration

System Control
- /proc/sys/net/ipv6/conf/BridgeName/accept_ra -> 0：Routing suggestions are not accepted
- /proc/sys/net/ipv4/conf/BridgeName/route_localnet -> 1：Redirect external traffic to loopback, need to be used with iptables

IPTABLES

INTERNAL
- filter
  - DOCKER-ISOLATION-STAGE-1 -i BridgeInterface ! -d Network -j DROP
  - DOCKER-ISOLATION-STAGE-1 -o BridgeInterface ! -s Network -j DROP
NON INTERNAL
- nat
  - DOCKER -t nat -i BridgeInterface -j RETURN
- filter
  - FORWARD -i BridgeInterface ! -o BridgeInterface -j ACCEPT
- HOST IP != nil
  - nat
    POSTROUTING -t nat -s BridgeSubnet ! -o BridgeInterface -j SNAT --to-source HOSTIP
    POSTROUTING -t nat -m addrtype --src-type LOCAL -o BridgeInterface -j SNAT --to-source HOSTIP
- HOST IP == nil
  - nat
    POSTROUTING -t nat -s BridgeSubnet ! -o BridgeInterface -j MASQUERADE
    POSTROUTING -t nat -m addrtype --src-type LOCAL -o BridgeInterface -j MASQUERADE
- Inter Container Communication Enabled
  - filter
    FORWARD -i BridgeInterface -o __BridgeInterface -j ACCEPT
- Inter Container Communication Disabled
  - filter
    FORWARD -i BridgeInterface -o __BridgeInterface -j DROP
- nat
  - PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
  - OUTPUT -m addrtype --dst-type LOCAL -j DOCKER
- filter
  - FORWARD -o BridgeInterface -j DOCKER
  - FORWARD -o BridgeInterface -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
- filter
  - -I FORWARD -j DOCKER-ISOLATION-STAGE-1

Create Endpoint

Core The Basics

Object

Overview

The Object instances are as follows. They are basically in the pkg/apis directory, you can find it yourself.

Unstructured

The scene of Unstructured and Object cooperation is as follows.

The example diagram is as follows.

Scheme

Builder

Scheme

Look at the definition of , the first four are maintained the relationship between reflect.Type and Schema.GroupVersionKind. The defaulterFuncs is used to build a default object.

type Scheme struct {
    // versionMap allows one to figure out the go type of an object with
    // the given version and name.
    gvkToType map[schema.GroupVersionKind]reflect.Type

    // typeToGroupVersion allows one to find metadata for a given go object.
    // The reflect.Type we index by should *not* be a pointer.
    typeToGVK map[reflect.Type][]schema.GroupVersionKind

    // unversionedTypes are transformed without conversion in ConvertToVersion.
    unversionedTypes map[reflect.Type]schema.GroupVersionKind

    // unversionedKinds are the names of kinds that can be created in the context of any group
    // or version
    // TODO: resolve the status of unversioned types.
    unversionedKinds map[string]reflect.Type

    // Map from version and resource to the corresponding func to convert
    // resource field labels in that version to internal version.
    fieldLabelConversionFuncs map[schema.GroupVersionKind]FieldLabelConversionFunc

    // defaulterFuncs is an array of interfaces to be called with an object to provide defaulting
    // the provided object must be a pointer.
    defaulterFuncs map[reflect.Type]func(interface{})

    // converter stores all registered conversion functions. It also has
    // default converting behavior.
    converter *conversion.Converter

    // versionPriority is a map of groups to ordered lists of versions for those groups indicating the
    // default priorities of these versions as registered in the scheme
    versionPriority map[string][]string

    // observedVersions keeps track of the order we've seen versions during type registration
    observedVersions []schema.GroupVersion

    // schemeName is the name of this scheme.  If you don't specify a name, the stack of the NewScheme caller will be used.
    // This is useful for error reporting to indicate the origin of the scheme.
    schemeName string
}

is used to convert label and value to internal label and value.

type FieldLabelConversionFunc func(label, value string) (internalLabel, internalValue string, err error)

AddKnownTypes

only needs to pay attention to one problem, that is, the GroupVersionKind is generated from the incoming GroupVersion through the Name method of reflect.Type as the Kind. Please see the simplified sample . The sample code can be executed under .

AddUnversionedTypes

The principle of is as follows. Unversioned Type can be understood as an Object mounted on a Group, and the Version will never be updated.

nameFunc

The principle of is as follows, just pay attention to the return type priority to Internal Type.

Others

Conversion

Landscape

The is used to represent the combination of source type and target type, stores the type and type name. is used as the conversion method from the default type to Name. defines the object conversion method.

Core Function Definitions

The DefaultNameFunc implementation is as follows.

var DefaultNameFunc = func(t reflect.Type) string { return t.Name() }

The ConversionFunc declaration is as follows.

type ConversionFunc func(a, b interface{}, scope Scope) error

FieldMappingFunc converts the key to Field in the source structure and the target structure.

type FieldMappingFunc func(key string, sourceTag, destTag reflect.StructTag) (source string, dest string)

ConversionFuncs

Converter

Briefly explain the following methods:

calls ConversionFuncs.Add method directly.
calls ConversionFuncs.AddUntyped method.
will not do the type of conversion record in the mapping.
register input type Field conversion method.

doConversion

When Converter executes object conversion methods, such as and , it is allowed to pass in a Meta object and execute the method to construct the scope object in this method.

Scope

defaultConvert

The handles the default type conversion, the incoming sv, dv have been ensured to be addressable through . This part of the code is a nearly perfect application of the reflect package in Go.

First, deal with the conversion of basic types, which can be converted by or .

switch st.Kind() {
    case reflect.Map, reflect.Ptr, reflect.Slice, reflect.Interface, reflect.Struct:
    // 这些类型后续处理
    default:
    // This should handle all simple types.
    if st.AssignableTo(dt) {
        dv.Set(sv)
        return nil
    }
    if st.ConvertibleTo(dt) {
        dv.Set(sv.Convert(dt))
        return nil
    }
}

Then process them separately according to dv.Kind().

dv.Kind() -> reflect.Struct

Return the result of the method directly. However, you need to pay attention to first convert sv and dv into the form of Key/Value respectively. Please study the method by yourself.

return c.convertKV(toKVValue(sv), toKVValue(dv), scope)

dv.Kind() -> reflect.Slice

case reflect.Slice:
    if sv.IsNil() {
        // Don't make a zero-length slice.
        dv.Set(reflect.Zero(dt))
        return nil
    }
    dv.Set(reflect.MakeSlice(dt, sv.Len(), sv.Cap()))
    for i := 0; i < sv.Len(); i++ {
        scope.setIndices(i, i)
        if err := c.convert(sv.Index(i), dv.Index(i), scope); err != nil {
            return err
        }
    }

dv.Kind() -> reflect.Ptr

case reflect.Ptr:
    if sv.IsNil() {
        // Don't copy a nil ptr!
        dv.Set(reflect.Zero(dt))
        return nil
    }
    dv.Set(reflect.New(dt.Elem()))
    switch st.Kind() {
        case reflect.Ptr, reflect.Interface:
        return c.convert(sv.Elem(), dv.Elem(), scope)
        default:
        return c.convert(sv, dv.Elem(), scope)
    }

dv.Kind() -> reflect.Interface

case reflect.Interface:
    if sv.IsNil() {
        // Don't copy a nil interface!
        dv.Set(reflect.Zero(dt))
        return nil
    }
    tmpdv := reflect.New(sv.Elem().Type()).Elem()
    if err := c.convert(sv.Elem(), tmpdv, scope); err != nil {
        return err
    }
    dv.Set(reflect.ValueOf(tmpdv.Interface()))
    return nil

dv.Kind() -> reflect.Map

case reflect.Map:
    if sv.IsNil() {
        // Don't copy a nil ptr!
        dv.Set(reflect.Zero(dt))
        return nil
    }
    dv.Set(reflect.MakeMap(dt))
    for _, sk := range sv.MapKeys() {
        dk := reflect.New(dt.Key()).Elem()
        if err := c.convert(sk, dk, scope); err != nil {
            return err
        }
        dkv := reflect.New(dt.Elem()).Elem()
        scope.setKeys(sk.Interface(), dk.Interface())

        if err := c.convert(sv.MapIndex(sk), dkv, scope); err != nil {
            return err
        }
        dv.SetMapIndex(dk, dkv)
    }