Developed by Yeison Nolberto Cardona Álvarez, MSc.César Germán Castellanos Dominguez, PhD.Digital Signal Processing and Control Group | Grupo de Control y Procesamiento Digital de Señales (GCPDS)Universidad Nacional de Colombia sede Manizales
Chaski Confluent¶
Chaski-Confluent is an advanced distributed communication framework designed to streamline data exchange between nodes over TCP/IP networks. It features robust node discovery, efficient message handling, dynamic pairing based on subscription topics, and extends functionality with remote interactions, ensuring resilience and flexibility in complex network topologies.
The project aims to provide a reliable and scalable solution for distributed systems, addressing the challenges of latency management, subscription routing, remote method invocation, and connection stability. With Chaski-Confluent, developers can easily build and maintain efficient communication protocols in their distributed applications.
Chaski-Confluent is a comprehensive solution to the challenges of distributed systems. Built with robustness and scalability in mind, it leverages advanced networking techniques to facilitate data exchange between nodes. Its architecture allows for dynamic scaling, maintaining communication efficiency without compromising performance as the network grows. This makes Chaski-Confluent an ideal choice for resilient and scalable distributed applications that need to adapt to changing conditions and workloads.
One of the standout features of Chaski-Confluent is its support for both TCP and UDP protocols. This dual-protocol capability ensures that developers can choose the most appropriate method for their specific use cases. Additionally, the sophisticated node discovery mechanism and intelligent subscription-based message routing enable the creation of dynamic network topologies where nodes can communicate effortlessly. These features, along with effective latency management and remote method invocation, position Chaski-Confluent as a powerful tool for developing modern distributed systems.
Main Features of Chaski Confluent¶
The Chaski-Confluent framework provides various powerful features that make it suitable for managing distributed systems. Here are some of the key features:
TCP and UDP Communication: Chaski Confluent supports both TCP and UDP protocols, allowing for reliable and timely message delivery between nodes. The framework ensures efficient data transfer irrespective of the underlying network conditions.
Node Discovery and Pairing: Automatic discovery of nodes based on shared subscription topics is a crucial feature. Chaski Confluent facilitates the pairing of nodes with common interests, making it easy to build dynamic and scalable network topologies.
Ping and Latency Management: The framework includes built-in mechanisms for measuring latency between nodes through ping operations. This helps in maintaining healthy connections and ensures that communication within the network is optimal.
Subscription Management: Nodes can subscribe to specific topics, and messages are routed efficiently based on these subscriptions. This allows for effective communication and data exchange only with relevant nodes.
Keep-alive and Disconnection Handling: Chaski Confluent ensures that connections between nodes remain active by implementing keep-alive checks. If a connection is lost, the framework handles reconnection attempts gracefully to maintain network integrity.
Remote Method Invocation: The Chaski Remote class enables remote method invocation and interaction across distributed nodes. Nodes can communicate transparently, invoking methods and accessing attributes on remote objects as if they were local.
Security: Implement robust security measures to protect data and ensure safe communication between the nodes. Features like encryption and authentication are essential to safeguarding the integrity of the network. For example, you can set up a Certificate Authority (CA) within your network to manage SSL certificates and ensure encrypted communication.
Flexible Configuration: The framework offers a flexible configuration system, allowing users to customize various parameters such as timeouts, retry intervals, and buffer sizes. This adaptability helps in optimizing the performance according to specific requirements.
Logging and Monitoring: Comprehensive logging and monitoring capabilities are integrated into the framework, providing real-time insights into the network activity and performance metrics. This aids in troubleshooting and maintaining the health of the system.
File Streaming and Transfer: ChaskiStreamer includes helpers to send
files in chunks through the network using push_file and to accept
incoming files when allow_incoming_files is enabled. This
facilitates distributing large payloads without blocking the event loop.
Persistent Storage: Nodes can store key/value pairs using an
SQLite-backed PersistentStorage. Data can be requested or served to
peers with the ChaskiStorageRequest message type.
Synchronous Interface: ChaskiStreamerSync wraps the asynchronous
streamer in a dedicated thread, offering a blocking API that integrates
with traditional synchronous code bases.
Celery Integration: The package ships with a custom Kombu transport
(chaski.utils.transport) so Chaski can act as a Celery broker or
backend.
Message Pool with TTL: Each node keeps a bounded pool of recently processed messages. This avoids processing duplicates and provides a configurable time-to-live for cached entries.
Chaski-Confluent components¶
Chaski Node¶
The Chaski_ Node is an essential component of the Chaski-Confluent system. It is responsible for initiating and managing network communication between distributed nodes. This class handles functions such as connection establishment, message passing, node discovery, and pairing based on shared subscriptions. Nodes keep track of their connections as “edges” where latency information and subscription data are stored. Each node can propagate received messages to its peers and caches recent messages in a bounded pool to avoid processing
Chaski Streamer¶
The Chaski-Streamer extends the functionality of Chaski-Node by
introducing asynchronous message streaming capabilities. It sets up an
internal message queue to manage incoming messages, allowing efficient
and scalable message processing within a distributed environment. The
ChaskiStreamer can enter an asynchronous context, enabling the user to
stream messages using the async with statement. This allows for
handling messages dynamically as they arrive, enhancing the
responsiveness and flexibility of the system. The streamer also supports
chunked file transfer via push_file and can store temporary results
in a PersistentStorage database. When synchronous behaviour is
required, ChaskiStreamerSync exposes the same API from a background
thread.
Chaski Remote¶
The Chaski-Remote class enhances the Chaski-Node functionality by enabling remote method invocation and interaction across distributed nodes. It equips nodes with the ability to communicate transparently, invoking methods and accessing attributes on remote objects as if they were local. This is achieved by utilizing the Proxy class, which wraps around the remote objects and provides a clean interface for method calls and attribute access. The remote node verifies module availability through a lightweight UDP check before the proxy is returned, ensuring that requested services are reachable.
Asynchronous Communication Architecture¶
The core functionalities of Chaski-Confluent revolve around efficient
and scalable communication mechanisms integral to modern distributed
systems. Central to its architecture is the use of the Python
asyncio library, which facilitates asynchronous programming to
manage concurrent connections without the overhead of traditional
threading models. This allows for high-performance message handling and
real-time node interactions, optimizing the framework for low-latency
and responsive communication.
In implementing Chaski-Confluent, leveraging asyncio ensures that tasks
such as node discovery, subscription management, and remote method
invocation are carried out efficiently. Asynchronous programming enables
the framework to handle multiple network operations simultaneously,
maintaining high throughput and scalability even under heavy network
loads. The integration of asyncio thus provides a robust foundation
for building dynamic and resilient distributed systems, ensuring
seamless and efficient data exchange across nodes.
Celery Transport and CLI Utilities¶
Chaski includes a custom Kombu transport so it can be used as a Celery
broker. The ChaskiChannel class relies on ChaskiStreamerSync to
publish and consume tasks through the topic celery_tasks. Several
command line scripts under chaski/scripts/ make it easy to start a
streamer root, a remote proxy or a certificate authority from the shell.
Documentation Overview¶
- ChaskiNode: Distributed Node Communication and Management
- ChaskiStreamer: Scalable Message Streaming in Distributed Networks
- ChaskiRemote: Proxy for Distributed Network Interactions
- Certification Authority
- Celery Integration with Chaski Confluent
- Chaski Confluent: Scripts
- Chaski Confluent: Scripts
- Create the streamer node
- Connect to the root node