English
 找回密码
 立即注册

Ethereum: Finding reliable P2P working nodes in a decentralized world

Vitalik 2025-9-18 15:24 70479人围观 ETH

Chapter 1: The demand for decentralized networks Ethereum as a decentralized blockchain network, its P2P The network layer is responsible for the key mission of realizing direct communication between nodes, supporting dynamic expansion and contraction of

Chapter 1: The need for a decentralized network


As a decentralized blockchain network, Ethereum's P2P network layer is responsible for the key missions of realizing direct communication between nodes, supporting dynamic expansion and contraction of the network, ensuring the synchronization and consistency of the entire network's blockchain status, and preventing various network attacks and malicious behaviors. It provides the entire blockchain system with a flexible network infrastructure that does not require a central server.

1.2 Overall architecture layered design


The Go-Ethereum P2P network adopts a classic layered architecture design:
┌─────────────────────────────────────────────────────────────────┐
│                        应用协议层                                │
├─────────────────┬─────────────────┬─────────────────┬─────────────┤
│ ETH Protocol    │ SNAP Protocol   │ LES Protocol    │ 自定义协议   │
│ 区块链数据同步   │ 状态快速同步     │ 轻客户端服务     │ 扩展功能     │
└─────────────────┴─────────────────┴─────────────────┴─────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                        P2P 核心层                               │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ p2p.Server      │ Protocol Manager│ Peer Manager                │
│ 网络服务管理     │ 协议管理器       │ 节点管理器                   │
└─────────────────┴─────────────────┴─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       传输抽象层                                │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ RLPx Transport  │ Message Codec   │ Connection Pool             │
│ 加密传输协议     │ 消息编解码       │ 连接池管理                   │
└─────────────────┴─────────────────┴─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       网络发现层                                │
├─────────────┬─────────────┬─────────────┬─────────────────────┤
│ Discovery v4│ Discovery v5│ DNS Discovery│ Bootstrap Nodes     │
│ Kademlia DHT│ 增强发现协议 │ DNS节点发现  │ 引导节点            │
└─────────────┴─────────────┴─────────────┴─────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       底层网络层                                │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ TCP Listener    │ UDP Socket      │ Network Interface           │
│ TCP连接监听     │ UDP数据报       │ 网络接口                     │
└─────────────────┴─────────────────┴─────────────────────────────┘

Chapter 2: P2P Core Implementation

2.1 Architecture design


P2P Server adopts a highly optimized event-driven architecture design to receive and process different types of events through multiple dedicated channels, effectively avoiding complex state management and lock competition issues. The system designs each functional module (node ​​discovery, connection dialing, network monitoring, etc.) as independent components and interacts through standardized interfaces, achieving good component decoupling. In terms of concurrency safety, the system gives priority to using channels for communication between components, which not only ensures thread safety but also maintains high-performance concurrent processing capabilities.


life cycle management mechanism


Server life cycle management follows strict state transitions to ensure system stability and reliability:


Concurrency control strategy


A four-layer concurrency control mechanism is adopted: mutex locks protect key states, channels implement asynchronous communication, waiting groups manage the coroutine life cycle, and context controls timeout and cancellation. This layered design avoids excessive synchronization and improves concurrency performance by giving priority to the lock-free mode of channels.

Event channel design


An event processing system is built using a channel-based event-driven architecture. Seven dedicated channels are designed to handle different types of events: the quit channel is responsible for the delivery of graceful shutdown signals, the addtrusted and removetrusted channels manage the dynamic addition and removal of trusted nodes, the peerOp channel handles various node operation requests, the delpeer channel handles node disconnect events, and the checkpointPostHandshake and checkpointAddPeer channels handle post-handshake checkpoints and node add checkpoints respectively. All these event channels are converged into the main event loop Server.run for unified scheduling and processing, forming an efficient event distribution mechanism to ensure that the system can respond to various network events in a timely manner and perform corresponding status updates and resource management.


2.2 Connection management mechanism

TCP listener implementation principle


The TCP listener adopts asynchronous listening mode and continuously accepts new connections in independent goroutine to avoid blocking the main thread. The system performs IP restrictions and pre-checks on the number of connections before establishing a connection, and controls the maximum number of concurrent connections through the semaphore mechanism to prevent resource exhaustion. Each new connection is handled asynchronously in a separate coroutine, ensuring that the listening loop is not blocked by a single connection. When an error occurs during the listening process, the system will automatically clean up resources and exit. It also supports a graceful shutdown mechanism that can stop accepting new connections and properly handle existing connections. In addition, the listener will dynamically update the network address information of the local node to ensure the real-time and accuracy of the node information.

Connection establishment process principle


The connection establishment process consists of five key stages:
  1. During the pre-check phase, verify whether the current number of connections exceeds the MaxPeers limit, checks whether the remote IP is in the network restriction list, verifies whether there are duplicate connections, and confirms that the number of inbound connections does not exceed the configured upper limit.;
  2. Entering the transport layer handshake phase, the RLPx secure channel is established through ECIES elliptic curve encryption and a session key is generated to ensure the confidentiality and integrity of subsequent communications.;
  3. In the authentication phase, malicious nodes are prevented from accessing by verifying the public key signature of the remote node, checking the validity of the node ID, and confirming that the node is not in the blacklist.;
  4. During the protocol negotiation phase, both parties exchange Hello messages to negotiate jointly supported protocol versions and capabilities, and determine the final set of enabled protocols.;
  5. In the connection activation phase, the corresponding processor is started for the successfully negotiated protocol, and the newly established connection is added to the active connection pool for unified management.

Connection pool management


Connection pool management adopts refined control strategy:
  1. Achieve precise control of resources by setting multi-layer restrictions such as the total number of connections, the number of inbound connections, and the number of protocol connections.
  2. Connections are divided into three types: static connections, dynamic connections and trusted connections based on the source and purpose of the connection. Each type adopts differentiated management strategies to meet different network requirements.
  3. The dial-up ratio mechanism intelligently adjusts the balance of inbound and outbound connections to ensure the diversity and stability of network connections.

Connection status maintenance


Each connection has complete status tracking and lifecycle management:
  1. Use a combination of node ID and connection flag to uniquely identify each connection, ensuring accurate identification and tracking of connections;
  2. Continuously record and maintain protocol capability information supported by each connection to provide a basis for protocol selection and message routing.;
  3. Monitor changes in connection status through a real-time status synchronization mechanism, enabling timely discovery and response to network anomalies or connection failures.;
  4. Bind and manage network resources and connection objects to achieve unified allocation, monitoring and recycling of resources to ensure efficient utilization of system resources and consistency in connection management.


2.3 Peer node management

Peer object life cycle


Peer has a complete life cycle management: starting from creating a Peer object based on an established connection and initializing the channel state, to starting all supported protocol processors to establish a message processing loop, to processing protocol messages during runtime, maintaining connection status and monitoring connection health, and finally shutting down the protocol processor gracefully and cleaning up resources to disconnect. The entire design ensures concurrency security in a multi-coroutine environment through channels and waiting group mechanisms, and implements error isolation between protocols to prevent single protocol failures from affecting the operation of other protocols. It also has automated coroutine life cycle and resource management capabilities, and provides a complete connection event subscription notification mechanism.


Node working example




2.4 Protocol multiplexing

Protocol registration mechanism


The protocol registration mechanism implements modular management of protocols through standardized protocol interface definitions. Each protocol contains core attributes such as name, version, number of message types, running functions, and node information. The system manages all registered protocols in a unified manner through the protocol registry, and publishes the list of supported protocols to peer nodes through the capability declaration mechanism during the handshake phase.

Message routing implementation


Message routing implements protocol isolation through the message code offset mechanism. Each protocol is allocated an independent message code space, and the router distributes the message to the corresponding protocol processor according to the message code range. At the same time, each protocol processor has independent input channels, write control and error handling mechanisms to ensure complete isolation and concurrency safety between protocols.

Protocol version negotiation


The protocol version negotiation adopts the best matching strategy. The system first sorts the protocol capabilities of both parties by name and version, and then matches them one by one to find a commonly supported protocol version. The local protocol list is compared with the remote capability list. For protocols with matching names, version compatibility is checked. If they are compatible, a processor is created and a message code offset is assigned. If they are not compatible, an available version is selected. Unmatched protocols are skipped directly until all protocols are checked and negotiated.

Chapter 3: In-depth analysis of node discovery mechanism

3.1 Discovery v4 protocol implementation

Kademlia DHT algorithm principle


Kademlia is a distributed hash table algorithm based on XOR distance metric, and Go-Ethereum adopts its core idea to build a node discovery network. The core of the algorithm is to assign a 256-bit unique identifier to each node in the network and calculate the "distance" between nodes through the XOR operation.

┌─────────────────────────────────────────────────────────────┐
│ 1. 节点ID空间 (256位标识符)                                    │
│    • 每个节点分配唯一的256位ID                                │
│    • ID通常由节点公钥的哈希值生成                             │
│    • 形成2^256的巨大ID空间                                   │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│ 2. XOR距离度量 d(x,y) = x ⊕ y                               │
│    • 使用异或运算计算节点间"距离"                             │
│    • 距离值越小表示节点越"接近"                               │
│    • 具有对称性、三角不等式等数学特性                         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│ 3. 二进制前缀树 (Binary Prefix Tree)                        │
│    • XOR距离自然形成二进制树结构                             │
│    • 共享前缀越长的节点距离越近                               │
│    • 支持高效的分层路由                                       │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│ 4. K桶路由表 (K-Bucket Routing)                             │
│    • 维护256个桶,每个桶对应一个距离范围                      │
│    • 每个桶最多存储K个节点(通常K=16)                         │
│    • 实现O(log N)复杂度的高效查找                            │
└─────────────────────────────────────────────────────────────┘

距离计算实例:
┌──────────────────────────────────────────────────────────────┐
│ 节点A ID: 1010110100...  (256位)                            │
│ 节点B ID: 1100101011...  (256位)                            │
│          ─────────────                                       │
│ XOR结果:  0110011111...  (异或运算)                         │
│                                                              │
│ 距离计算: 统计XOR结果的前导零个数                             │
│ • 前导零越多,距离越近                                        │
│ • 前导零个数决定了节点在哪个桶中                              │
│ • 例如:2个前导零 → 距离范围[2^2, 2^3) → 桶2                │
└──────────────────────────────────────────────────────────────┘

Node search algorithm process:



The search algorithm adopts an iterative method, each time selecting α unqueried nodes (usually α=3) closest to the target, and sending query requests in parallel. By continuously narrowing the search range, the K nodes closest to the target are finally found.

Routing table maintenance mechanism:
  • Node verification and replacement: regularly verify the activity of nodes in the bucket, remove failed nodes: if the node already exists, update the timestamp, if the bucket is not full, add it directly, if the bucket is full, try to replace the oldest node
  • Bucket refresh: perform random searches on buckets that have not been updated for a long time, and discover new nodes: concurrently query multiple nodes through iteration, wait for responses, and continue to search for closer nodes until the node set closest to the target is found.
  • Network diversity: Limit the number of nodes in the same IP segment to improve the network's ability to resist attacks.

This design makes the Kademlia network have good self-organizing capabilities and fault tolerance, and can maintain stable routing performance in an environment where nodes dynamically join and leave.

UDP communication mechanism


Discovery v4 uses the UDP protocol for communication and supports four message types:
Message typemessage codeFunction descriptionTreatment methoduse
Ping1Node activity detectionhandlePing()• Verify whether the node is online
• Update the last active time of the node
• Trigger Pong response
Pong2Response to Ping messagehandlePong()• Confirm node active status
• Complete round trip time measurement
• Verify node network address
Findnode3Find the specified nodehandleFindnode()• Request the node closest to the target
• Implement Kademlia search algorithm
• Build and maintain routing tables
Neighbors4Return neighbor node listhandleNeighbors()• Respond to Findnode requests
• Provide node discovery information
• Disseminate network topology knowledge

3.2 Discovery v5 protocol evolution

ENR (Ethereum Node Records) system


Discovery v5 introduces the ENR recording system to provide structured node information:
type Record struct {
    seq       uint64                 // 序列号
    signature []byte                 // 数字签名
    raw       []byte                 // 原始数据
    pairs     []pair                 // 键值对
}

// ENR记录示例
func createENR(priv *ecdsa.PrivateKey) *enr.Record {
    var r enr.Record
    r.Set(enr.IP(net.IPv4(127, 0, 0, 1)))
    r.Set(enr.TCP(30303))
    r.Set(enr.UDP(30303))
    r.Set(enr.ID("v4"))
    r.Set(enr.Secp256k1(priv.PublicKey))
    return &r
}

Comparison between ENR and enode


In Discv4, nodes use the enode URL format (enode://@:). ENR has the following advantages over enode:
  • Scalability: enode only contains public key, IP and port; ENR supports arbitrary key-value pairs.
  • Signature verification: ENR contains signatures to ensure the data is trustworthy; enode has no signature and is easily tampered with.
  • Protocol support: ENR can declare sub-protocols, suitable for multi-protocol scenarios (such as Ethereum 2.0).
  • Versioning: The serial number supports dynamic update, and the enode needs to be replaced manually.

topic discovery mechanism


Discovery v5 introduces a topic-based node discovery mechanism that allows nodes to be classified and discovered based on specific service or protocol types. This mechanism is particularly suitable for multi-protocol environments such as Ethereum 2.0. Topics enable precise node classification and finding by coordinating the core aspects of the work:
  • When the node starts, it actively declares the topic types it supports to the network to complete the registration.
  • Each node in the network maintains a mapping table from topics to node lists for information storage and retrieval.
  • The client can query the node list of the corresponding topic to obtain the target node according to specific service requirements.
  • At the same time, nodes need to regularly advertise their subject information to the network to maintain the validity of registration and the timeliness of information.


Topic discovery implementation mechanism


The topic discovery mechanism ensures efficient and flexible service discovery:
  • Adopt a hierarchical topic naming scheme (e.g. eth2/beacon_chain/mainnet) to achieve refined service classification and organization
  • Set a lifetime for topic registration through a time decay mechanism and require regular refreshes to maintain active status
  • Supports the deployment of multiple nodes under the same topic to achieve load balancing and allows the client to select the optimal node based on performance indicators.
  • Provides dynamic discovery capabilities so that nodes can register and unregister topics in real time to quickly adapt to service changes and network topology adjustments.



This topic discovery mechanism provides the Ethereum network with more accurate node discovery capabilities, and is especially suitable for scenarios that require specific service types of nodes, such as validator discovery, shard network construction, etc.

3.3 DNS Discovery Protocol

EIP-1459 standard implementation


The DNS discovery protocol is based on the EIP-1459 standard and implements decentralized node discovery through standard DNS infrastructure. The core of the protocol is the DNS client, which integrates several key components to ensure efficient and reliable node discovery.

The implementation mechanism of DNS query adopts the standard TXT record query method: the system combines the node hash value and the domain name to form a complete DNS query name, queries the corresponding TXT record through the standard DNS parser, and then traverses all the returned TXT records to try to parse them into valid node entries. Once a valid entry that meets the verification scheme is found, it will be returned immediately. If all records cannot be parsed, a query failure error will be returned. This design makes full use of the existing DNS infrastructure to achieve high availability and globally distributed node discovery services.

tree structure design


DNS discovery uses a Merkle tree structure to organize node information, and builds a hierarchical node discovery system through different types of entries:

DNS tree entry type description:
Entry typeiconFunction descriptioncontent
Root Entry🌳tree entry point• Contains branch and link hashes
• Serial numbers are used for version control
• Digital signature ensures integrity
Branch Entry🌿intermediate node• Contains a hash list of child nodes
• Implement the hierarchical structure of the tree
• Support load distribution
Link Entry🔗Point to external DNS tree• Implement cross-domain node discovery
• Support distributed management
• Domain name and public key verification
ENR Entry📋leaf node• Contains complete node records
• Network address and protocol information
• Node identity and capabilities


3.4 Node mixing and scheduling

FairMix node mixer


FairMix implements a fair node mixer that obtains nodes by polling multiple discovery sources (such as Discovery v4, v5, DNS, etc.) to ensure that each discovery source has an equal opportunity to provide nodes and avoid a certain source from monopolizing the node discovery process, thereby achieving load balancing and fair scheduling of discovery sources.

Boot node mechanism


The boot node provides network entry for new nodes:
var MainnetBootnodes = []string{
    // Ethereum Foundation Go Bootnodes
    "enode://d860a01f9722d78051619d1e2351aba3f43f943f6f00718d1b9baa4101932a1f5011f16bb2b1bb35db20d6fe28fa0bf09636d26a87d31de9ec6203eeedb1f666@18.138.108.67:30303",
    "enode://22a8232c3abc76a16ae9d6c3b164f98775fe226f0917b0ca871128a74a8e9630b458460865bab457221f1d448dd9791d24c4e5d88786180ac185df813a68d4de@3.209.45.79:30303",
    // 更多引导节点...
}

Node type

Node typeconnection flagConnection limitsreconnection mechanismConfiguration methodFeatures
Bootstrap NodesdynDialedConnLimited by MaxPeersNo automatic reconnectionBootstrapNodes Configuration• The seed node when the network starts
• Provides initial node discovery
• Help new nodes join the network
Static NodesstaticDialedConnLimited by MaxPeersAutomatically reconnectStaticNodes Configuration
AddPeer() method
• Preconfigured persistent connections
• Automatically reconnect after disconnection
• Priority is higher than dynamic nodes
Trusted NodestrustedConnNot limited by the number of connectionsAutomatically reconnectTrustedNodes Configuration
AddTrustedPeer() method
• Can exceed MaxPeers limit
• Highest connection priority
• Bypass all connection checks
Dynamic NodesdynDialedConnLimited by MaxPeersNo automatic reconnectionNode discovery protocol automatically obtained• Found via discovery protocol
• Fill remaining connection slots
• The main components of network topology
Inbound NodesinboundConnLimited by MaxInboundConnsPassive connectionOther nodes actively connect• passively accepted connection
• Limited by the number of inbound connections
• May be actively disconnected

Connection priority
  • Trusted nodes > Static nodes > Dynamic nodes > Inbound nodes
  • Trusted nodes can exceed the connection limit
  • Static nodes have priority over dynamic nodes in obtaining connection slots.
  • Inbound connections are accepted when there are insufficient connections

Chapter 4: RLPx Encrypted Transmission Protocol

4.1 RLPx protocol stack design

protocol hierarchy


The RLPx protocol stack adopts a layered design, with each layer responsible for specific functions:


4.2 Working mechanism

  1. RLPx uses Elliptic Curve Integrated Encryption Scheme (ECIES) for a secure handshake and establishes an encrypted session via ECDH key exchange
  2. Using Elliptic Curve Diffie-Hellman Key Exchange
  3. The session key is generated through the ECDH key derivation function
  4. Forward security guarantee: a new temporary key pair is generated for each handshake

4.3 Message frame format and processing

Frame structure design


The RLPx message frame contains two parts: header and data:
消息帧格式:
+--------+--------+--------+--------+
|      Header (16 bytes)            |
+--------+--------+--------+--------+
|      Header MAC (16 bytes)        |
+--------+--------+--------+--------+
|      Frame Data (variable)        |
+--------+--------+--------+--------+
|      Frame MAC (16 bytes)         |
+--------+--------+--------+--------+

Header 格式:
+--------+--------+--------+
| Frame Size (3 bytes)    |
+--------+--------+--------+
| Header Data (13 bytes)  |
+--------+--------+--------+

AES-CTR stream encryption


Use AES-CTR mode for stream encryption. The message frame reading adopts the "read-verify-decrypt" security process: first read the 32-byte frame header (16-byte encryption header + 16-byte MAC), verify the integrity of the header MAC and then decrypt to obtain the frame size. ; Then read the aligned frame data and MAC verification code according to the frame size to verify the integrity of the frame data. ; Finally, the frame data is decrypted and padding is removed, returning the original message content. This double MAC verification (header and data are verified separately) combined with complete encryption ensures the security and integrity of message transmission.

MAC message authentication


Use HMAC to ensure message integrity: first write the input data to the hash function and generate a seed value; Then use AES to encrypt the seed and XOR the result with the input data. ; Finally, the XOR result is AES encrypted again to generate a 16-byte MAC value.

Snappy compression optimization


Support Snappy compression to reduce network transmission

Chapter 5: Application protocol layer implementation

5.1 In-depth analysis of ETH protocol


The ETH protocol is the main application protocol of the Ethereum network and is responsible for the synchronization and dissemination of blockchain data. The protocol has evolved through multiple versions, and currently mainly uses the ETH/68 and ETH/69 versions to provide complete blockchain data exchange capabilities for full nodes.

5.2 SNAP protocol status synchronization


The SNAP protocol is specifically designed to solve the problem of state synchronization efficiency. Allow nodes to directly download the latest status snapshot, greatly reducing synchronization time.

5.3 LES light client protocol


The LES (Light Ethereum Subprotocol) protocol provides a lightweight solution for resource-constrained devices to participate in the Ethereum network. Allows light clients to verify transactions and query status without downloading the full blockchain.

5.4 Protocol extension and customization


Go-Ethereum provides a complete protocol extension framework, allowing developers to implement custom protocols according to specific needs. It provides a strong technical foundation for the innovation of blockchain applications.

Summarize


Go-Ethereum P2P The network has been deeply optimized specifically for blockchain scenarios: it efficiently handles block propagation and transaction synchronization through event-driven architecture, uses multi-layer security protection to resist malicious node attacks, uses refined concurrency control to achieve high-performance processing of large-scale node networks, uses state machine management to ensure the reliability of complex network state transitions, establishes a complete resource management mechanism to support long-term stable operation of nodes, and uses multiple fault-tolerant mechanisms to respond to frequent node changes and network abnormalities to ensure the self-healing ability and continuous stable operation of the entire blockchain network. This article can only attempt to explain part of the content.


精彩评论0
我有话说......
TA还没有介绍自己。