Software Integration
Seamless communication — that, among other consequential advantages, is the ultimate goal when integrating your software. And today, integrating modern software means fusing various applications and/or systems — many times across distributed environments — with the common goal of unifying isolated data. This effort often signifies the transition of legacy applications to cloud-based systems and messaging infrastructure via microservices and REST APIs.So what's next? Where is the path to seamless communication and nuanced architecture taking us? Dive into our 2023 Software Integration Trend Report and fill the gaps among modern integration practices by exploring trends in APIs, microservices, and cloud-based systems and migrations. You have to integrate to innovate!
Distributed SQL Essentials
Advanced Cloud Security
Integration testing is an approach where different components or modules of a software application are tested as a combined entity. You can run integration tests seamlessly regardless of whether one programmer or other programmers code these modules. Before a release, a software application undergoes various operations like extensive testing, iteration, and exception handling to identify bugs and make the software business ready. Integration testing is a primary phase of the software testing process when testing is done at the development stage. What Is Integration Testing? When separate modules are combined and tested as a whole, this software testing phase is referred to as integration testing. It takes place before the verification and validation process and after the unit testing. What makes integration testing essential is its ability to check the behavior of different units of a system altogether. When these units are taken individually, they function correctly with almost no errors, but when they are brought together, they uncover incorrect behavior if that exists. Integration testing is crucial because it's done at an early stage of development and helps prevent serious issues that may arise later by costing high fixing measures. You should run integration tests every time you change the existing code. What Is the Purpose of Integration Testing? Initially, software testing was not dependent on integration testing, and nobody had ever thought about building an advanced testing phase with the capability of finding issues during the development process. But with the growing digital sphere, the demand for integration testing has increased. Here are some major reasons why integration testing is crucial: To analyze integrated software modules: Analyzing the working of integrated software modules is the primary objective of integration testing. As per the test plan requirements, integration testing ensures connectivity between individual modules by examining the rendering values and logically implementing them. To ensure seamless integration between third-party tools and different modules: It's crucial to ensure the data accepted by the API is correct so that the response generated is as per the requirement. Integration testing for the interaction between modules and third-party tools helps to ensure that the data is correct. To fix exception handling: Before releasing the final build, it is crucial to pinpoint the weak spots and red flag them to minimize exception handling defects as much as possible. Missing these defects in the initial or developing stage will be expensive to fix after the release. Difference Between Unit Testing and Integration Testing Unit testing Integration testing It is a white-box testing process. It is a black-box testing process. It is performed by developers. It is performed by testers. Finding defects is easy as each unit is tested individually. Finding defects is hard as all modules are tested together. It is always performed first before going through any other testing process. It is performed after unit testing and before system testing. Developers are aware of the internal design of the software while testing. Testers are not aware of the internal test design of the software. Difference Between Integration Testing and System Testing Integration testing System testing It ensures all combined units can work together without errors. To ensure that the total build fills the business requirements and specifications. It is black box testing. It is a white box and black box testing or grey box testing. It doesn't fall in the acceptance testing class and performs functional types of tests. It falls in the acceptance testing class and performs functional and non-functional tests. It is level two testing. It is level three testing. It identifies majorly interface errors. It helps to identify system errors. Benefits of Integration Testing Integration testing helps expose any defects that can arise when these components are integrated and need to interact with each other through integration tests. It makes sure that integrated modules work correctly as expected. It is a quick testing approach, so once the modules are available, the tester can start testing them. It detects all errors that are related to the interface between modules. Helps modules interact with third-party tools and, most importantly, different APIs. It is more efficient because it typically covers a large system volume. Increases the test coverage and also improves the reliability of tests. Types of Integration Testing Integration testing is performed by combining different functional units and testing them to examine the results. There are four types of integration testing, each of which focuses on testing the software differently. Integration testing consists of the following types: Incremental integration testing Non-incremental/Big Bang integration testing. Incremental Integration Testing In the incremental testing approach, all logically related modules are integrated, and then testing is done to check the proper functionality of the application as per the requirement. After this, the other related modules are then integrated incrementally, and the process continues until all the integrated, logically related modules are tested successfully. The incremental approach is carried out by three different methods: Top Down Approach Bottom-Up Approach Sandwich Approach Top Down Approach The top-down integration testing approach involves testing top-level units first, and lower-level units will be tested step-by-step. Test Stubs are needed to simulate lower-level units, which can't be available during the initial phases. Advantages It requires little planning. Convenient for small systems. It covers all the modules. Disadvantages The top-down testing approach is not recommended for large-scale systems as fault localization is complicated. As the prerequisite of this approach is completing all modules, the testing team remains extremely time-bound when executing the test. Since all the modules are tested simultaneously, you can't test modules based on priority or critical functionality. Non-Incremental/Big Bang Testing In this non-incremental testing approach, all the developed modules are tested individually and then integrated and tested once again. This is also known as big bang integration testing. Big Bang Integration Testing This type of integration testing involves coupling most of the developed modules into a larger system, which is then tested as a whole. This method is very effective for saving time. Test cases and their results must be recorded correctly to streamline the integration process and allow the testing team to achieve its goals. Advantages Good for testing small systems. Allows for finding errors very quickly and thus saves a lot of time. Disadvantages Fault localization is tough. Finding the root cause of the problem is quite difficult. How Is Integration Testing Done? When the system is ready, and the units are successfully tested individually, they can be integrated and tested. The complete process of integration testing includes several steps and has a range of frameworks and continuous integration. Here's how you can perform integration testing: Firstly, prepare a test integration plan and the required frameworks. Decide the type of integration testing approach: Bottom-Up, Top-Down, Sandwich testing, or Big Bang. Design test cases, scripts, and scenarios. Deploy the chosen components to run the integration testing. You must track and record the testing results if there are any errors or bugs or if the test goes bug-free. Finally, you must repeat the same process until the entire system is tested. Entry and Exit Criteria for Integration Testing Integration testing has both entry and exit criteria that one should know before starting. Entry Criteria Approval: The integration test plan document has been signed off and approved. Preparation: Integration test cases have been prepared. Data creation: Test data is created. Unit testing: Unit testing of each developed module/component is complete. Dealing with defects: All the high-priority and critical defects are closed. Test environment: The test environment is set up for integration testing. Exit Criteria All the integration test cases on different parameters have been successfully executed. All critical and priority P1 and P2 defects are closed. The test report has been prepared. Example: Integration Test Cases Integration test cases mainly focus on the data transfer between the modules as modules/components that are already unit tested, interface between the modules, and integrated links. For example, let's take integration test cases for Linkedin applications: Verifying the interface link between the login and home pages. That means when a user enters the correct login credentials, it should get directed to the homepage. Verifying the interface link between the home page and the profile page. When the user selects the profile option, the profile page should open up. Verify the interface link between the network page and your connection pages. On clicking the accept button for received Invitations on the network page, it should show the accepted invitation on your connection page once clicked. Verify the interface link between the Notification pages, and say the congrats button. On clicking the say congrats button, the user should get directed toward the new message window. These are the steps of how LinkedIn works and how Integration test cases are included in testing. Manual and Automated Integration Testing Integration testing usually doesn't require specific tools. These tests are often run manually by QA teams. In most cases, it happens in parallel with the development process, which is the most efficient approach. First, individual software units are created, and then these units are checked by a development team. After successful checks, QA engineers start combining different units and inspecting them, focusing first on the interfaces and then on the connections between these units. QA engineers don't require specific tools to inspect these features, even if they are separate. Regarding automated testing, Selenium is the most widely-used framework for integration testing. If you start with integration testing, don't waste time setting up expensive in-house test infrastructure. Opt for cloud-based testing platforms like LambdaTest. Using LambdaTest's online browser farm, you can run integration tests on 3000+ multiple browsers, devices, and OS combinations. Its simple onboarding process makes it easy to perform mobile app and web testing. LambdaTest supports automated testing tools like Selenium, Cypress, Playwright, Puppeteer, Appium, Espresso, and XCUITest, among others. Devs and testers can also leverage LambdaTest's HyperExecute - an end-to-end test orchestration cloud to run automated tests at a blazing speed of up to 70% more than any other traditional cloud grids. Integration Testing Tools With the help of automated tools available, integration testing can greatly impact the various modules of the software applications. These simplify the process and make it more agile. Here are some of the best integration testing tools: Selenium: Selenium is the leading large-scale open-source test automation framework to automate integration test suites for your web applications. Here are some primary features and highlights that make Selenium a top-popular tool: It supports multiple languages – C#, Ruby, Java, JavaScript, PHP, Java, Python, Ruby, and Perl. Run in different system environments – Mac, Windows, Linux. Works with all popular browsers, including Firefox, Safari, Chrome, and Headless. W3C standardization makes testing and scripting seamless. It allows running parallel tests with different hybrid test data. Pytest: Pytest is widely used for writing and running test code for Python automation testing. It can also scale up and works perfectly while testing complex libraries and applications. Here are some amazing features that make pytest an excellent choice for automated integration testing: Pytest can significantly reduce the overall testing time by running tests parallelly. If test files and features are not directly indicated, pytest will automatically define them. Pytest has built-in command-line support and test discovery support. RFT: RFT stands for IBM Rational Functional Tester. It is a popular tool that makes creating scripts that mimic the behavior of human testers easy. To enhance your testing experience, IBM offers different other software solutions that you can integrate with RFT. Not just maintaining test scripts, RFT provides a couple of different features as well; these are: Storyboard mode simplifies editing and test visualization, in particular, through screenshots. Applying recording tools to make test scripting easy. Data-driven testing for the same series of actions using varying data sets. For collaborative SDLC management, it allows integration with other software. VectorCAST: The VectorCAST software testing platform is one of the best in the market to automate testing activities across the software development lifecycle. The advantages of using VectorCAST are: Focus on embedded systems. Enable continuous and collaborative testing. Works with your existing software development tools. Embedded developers can use this highly automated unit and integration test tool to validate business-critical embedded systems and safety. LDRA: LDRA drives the market for software tools that can effortlessly automate code analysis and testing for safety, mission, and business-critical needs. With LDRA, you get- Customer-focused certification services. Consultancy offerings. LDRA tools to achieve early error identification and elimination. Tracing requirements through static and dynamic analysis to unit testing. Verification for various hardware and software platforms. Challenges of Integration Testing Like any other testing technique, integration testing also has some challenges that testers and developers encounter. These challenges include: Integration testing management is complex sometimes because of various factors like databases, platforms, environment, etc. Integrating a new system into one or two legacy systems requires a lot of change and testing efforts. Compatibility between systems developed by different companies is quite challenging for programmers. There are many different paths and permutations to apply for testing integrated systems. Best Practices for Integration Testing Before starting your integration testing, you should follow or implement a few best practices. Run integration tests before unit testing: It's crucial to discover bugs early in the development cycle because the later you discover the bug, the more expensive it is to fix. For a smooth development cycle, making things perfect on initial development is mandatory before stepping to "big things," like Integration testing. Avoid business logic tests: Unit tests are typically high-speed, so they are run for every build triggered in the CI environment. Since they target the fundamental correctness of code, running them frequently is critical to detect bugs early on in business logic so that the developer who introduced the bug can fix it immediately. Keep your testing suites separate: Integration tests should not be run together with unit tests. Developers working on the specific business logic in the code must be able to run unit tests and get near-immediate feedback to ensure that they haven't broken anything before committing code. Log extensively: A unit test has a specific scope and tests a tiny piece of your application, so when it fails, it's usually relatively easy to understand why and fix the problem. All in All The main objective of integration testing is to ensure that the entire software system works flawlessly when it is put together. During the unit testing phase, if any critical aspects are overlooked, they are highlighted and, in turn, can be corrected before the final launch.
This is a detailed guide on mTLS and how to implement it with Istio service mesh. We will be covering the following topics here: Understanding mTLS protocol wrt TCP/IP suite SSL vs TLS vs mTLS Why is mTLS important? Use-cases of mTLS Certificate Authority, Publick keys, X.509 certificate: Must-know mTLS concepts How does mTLS work? How to enable mTLS with Istio service mesh Certificate management for mTLS in Istio What Is mTLS? Mutual Transport Layer Security (mTLS) is a cryptographic protocol designed to authenticate two parties and secure their communication in the network. mTLS protocol is an extension of TLS protocol where both the parties- web client and web server- are authenticated. The primary aim of mTLS is to achieve the following: Authenticity: To ensure both parties are authentic and verified Confidentiality: To secure the data in the transmission Integrity: To ensure the correctness of the data being sent mTLS protocol: A Part of the TCP/IP Suite mTLS protocol sits between the application and transport layers to encrypt only messages (or packets). It can be seen as an enhancement to the TCP protocol. The below diagram conceptually provides the location of mTLS in the TCP/IP protocol suite. SSL vs TLS vs mTLS: Which Is New? Security engineers, architects, and developers use SSL, TLS, and mTLS interchangeably, often because of their similarity. Loosely mentioning, mTLS is an enhancement to TLS, and TLS is an enhancement to SSL. The first version of Secure Socket Layer (SSL) was developed by Netscape corporate in 1994; the most popular versions were versions 2 and 3- created in 1995. It was so widely popular that it made its way into one of the James Bond movies (below is the sneak-peak of Tomorrow Never Dies, 1997). The overall working of SSL is carried by three sub-protocol: Handshake protocol: This is used to authenticate the web client and the web server and establish a secured communication channel. In the handshaking process, a shared key will be generated, for the session only, to encrypt the data during communication. Record protocol: This protocol helps to maintain the confidentiality of data in the communication between the client and the server using a newly generated shared secret key. Alert protocol: In case the client or the server detects an error, the alert protocol would close the SSL connection ( the transmission of data will be terminated); destroying all the sessions, shared keys, etc. As there were more internet applications, the requirement for fine-grain security of the data in the network was more. So Transport Layer Security (TLS) - a standard internet version of SSL - was developed by IETF. Netscape handed over the SSL project to IETF, and TLS is an advanced version of SSL; the code idea and implementation of the protocol are the same. The main difference between the SSL and TLS protocols is that the cipher suite (or the algorithms) used to encrypt data in TLS is advanced. Secondly, the handshake, record, and alert protocols are modified and optimized for internet usage. Note: In the SSL handshake protocol, the server authentication to the client by sending the certificate was mandatory, but the client's authentication was optional to secure the line. But in TLS, there was only a provision to authenticate we-servers to the client, not vice-versa. Almost all the websites you visit with HTTPS as the protocol will use TLS certificates to establish themselves as genuine sites. If you visit Google.com and click the padlock symbol, it will show the TLS certificates. The TLS was mainly used for web applications with the client being the user. Additionally, ensuring the authentication of billions of clients or users is only feasible for some web applications. But as the large monolithic applications broke into numerous microservices that communicate over the internet, the need for mTLS grew suddenly. mTLS protocol ensures both the web client and the web server authenticate themselves before a handshake. (We will see the working model of the mTLS protocol later in this article). Why Is mTLS More Important Than Ever? Modern business is done using web applications whose underlying architecture follows a hybrid cloud model. Microservices will be distributed across public/private clouds, Kubernetes, and on-prem VMs. And the communication among various microservices and components happens over the network, posing a significant risk of hacking or malicious attacks. Below are a few scenarios of cyber-attacks on the web that can be avoided entirely by using mTLS protocols. Man-in-the-middle attack (MITM): Attackers can place themselves between a client and a server to intercept the data during the transmission. When mTLS is used, attackers cannot authenticate themselves and will fail to steal the data. IP Spoofing: Another case is when bad guys masquerade as someone you trust and injects malicious packets into the receiver. This is again solved by end-point authentication in mTLS to determine with certainty if network packets or the data originates from a source we trust. Packet Sniffer: The attacker can place a passive receiver near the wireless transmitter to obtain a copy of every packet transmitted. Such an attack is prevalent in banking and Fintech domains when an attacker wants to steal sensitive information such as card numbers, banking application usernames, passwords, SSNs, etc. Since packet sniffing is non-intrusive, it is tough to detect. Hence the best way to protect data is to involve cryptography. mTLS helps encrypt the data using complex cryptographic algorithms that are hard to decipher by packet sniffers. Denial-of-service (DoS) attacks: The attackers aim to make the network or the web server unusable by legitimate applications or users. This is done by sending vulnerable packets, or deluge to packets, or by opening a large number of TCP connections to the hosts (or the web server) so that the server ultimately crashes. DoS and Distributed DoS (advanced DoS technique) can be avoided by invoking mTLS protocols in the applicable communication. All the malicious DoS attacks will be discarded before entering into the handshake phase. Use Cases of mTLS in the Industry The use cases of mTLS are growing daily with the increasing usage of business through web applications and the simultaneous rise in threats of cyberattacks. Here are a few important use cases based on our experiences while discussing with many leaders from various industries or domains- banking, fintech, and online retail companies. Hybrid cloud and multicloud applications: Whenever organizations use a mix of data centers — on-prem, public, or private cloud — the data leaves the secured perimeter and goes out of the network. In such cases, mTLS should be used to protect the data. Microservices-based B2B software: Many B2B software in the market follows a microservices architecture. Each service would talk to the other using REST APIs. Even though all the services are hosted in a single data center, the network should be secured to protect the data in transit (in case the firewall is breached). Online retail and e-commerce application: Usually, e-commerce and online retail applications use Content Delivery Network (CDN) to fetch the application from the server and show it to users. Although TLS is implemented in the CDN to authenticate itself when a user visits the page, there should be a security mechanism to secure the network between the CDN and the web server through mTLS. Banking applications: Applications that carry susceptible transactions, such as banks, financial transaction apps, payment gateways, etc., should take extreme precautions to prevent their data from getting stolen. Millions of online transactions happen every day using various banking and fintech apps. Sensitive information such as bank usernames, passwords, debit/credit card details, CVV numbers, etc., can be easily hacked if the data in the network is not protected. Strict authentication and confidentiality can be applied to the network using mTLS. Industry regulation and compliance: Every country will have some rules and standards to govern the IT infrastructure and protect the data. All the policies, such as FIPS, GDPR, PCI-DSS, HIPAA, ISO27001, etc., outline strict security measures to protect the data-at-rest and data-in-transit. For strict authentication in the network, mTLS can be used, and companies can adhere to various standards. Below are the few concepts one needs to be aware of before understanding the mechanism of how mTLS works. (You can skip reading if you are comfortable.) Certificates and Public/Private Keys: Must-Know mTLS Concepts Certificates A (digital) certificate is a small computer file issued by a certificate authority (CA) to authenticate a user, an application, or an organization. A digital certificate contains information such as- the name of the certificate holder, serial number of the certificate, expiry date, public key, and signature of the certificate issuing authority. Certificate Authority (CA) A certificate authority (CA) is a trusted 3rd party that verifies user identity and issues an encrypted digital certificate containing the applicant's public key and other information. Notable CAs are VeriSign, Entrust, LetsEncrypt, Safescript Limited, etc. Root CA/Certificate Chain Certificate Authority hierarchies are created to distribute the workloads of issuing certificates. There can be entities issuing certificates from different CA at various levels. In the multi-level hierarchy (like parent and child) of CAs, there is one CA at the top, called the Root CA (refer to the below image). Each CA would also have its certificate issued by the parent CA, and the root CA will have self-signed certificates. To ensure the CA (which issued the certificate to the client/server) is trusted, the security protocol suggests that entities send their digital certificate and the entire chain leading up to the root CA. Public and Private Key Pair While creating certificates for an entity, the CA would generate a public and a private key- commonly called a public key pair. The public and private keys are used to authenticate their identity and encrypt data. Public keys are published, but the private key is kept secret. If you are interested to learn about the algorithms to generate public keys, read more on RSA, DSA, ECDSA, and ed25519. X.509 Certificate It is a special category of the certificate, defined by the International Telecommunications Union, which binds an application's identity (hostname, organization name, etc.) to a public key using a digital signature. It is the most commonly used certificate in all the security protocols SSL/TLS/mTLS for securing web applications. How Does mTLS Work? As explained earlier, the mTLS has a similar implementation of sub-protocols as SSL. There are 8 phases (mentioned below) for two applications to talk to each other using the mTLS protocol. Establish security capabilities with hello: The client tries to communicate with the server (also known as client hello). The client hello message would contain values for specific parameters such as mTLS version, session id, Cipher suite, compression algorithm, etc. The server also would send a similar response called server hello with the values (it supports) for the same parameters sent by the client. Server authentication and key exchange: In this phase, the server would share its digital certificate (mostly X.509 certificates for microservices) and the entire chain leading up to root CA to the client. It would also request the client's digital certificate. Client verifies the server's certificate: The client would use the public key in the digital certificate to validate the server's authenticity. Client authentication and key exchange: After validation, the client sends a digital certificate to the server for verification. Server verifies client's certificate: The server verifies the client's authenticity. Master key generation and handshake complete: Once the parties' authenticity is established, the client and server will establish a handshake, and two new keys will be generated; shared secret information is only known to the parties and active for the session. Master secret: for encryption Message Authentication Code (MAC): for assuring message integrity Communication encrypted and transmission starts: The exchange of the information will begin with all the messages or packets encrypted using the master secret key. Behind the veil, the mTLS protocol will divide the message into smaller blocks called fragments, compress each fragment, add the MAC for each block, and finally encrypt them using the master secret. Data transmission starts: Finally, the mTLS protocol will append headers to the blocks of messages and send it to TCP protocol to send it to the destination or receiver. Session ends: Once the communication completes, the session will close. If an anomaly is detected during the transmission, the mTLS protocol will destroy all the keys and secrets and terminate the session immediately. Note: In the above phases, we have assumed that the CA would have issued a certificate to the entities which are still valid. In reality, the certificate of mission-critical applications expires soon, and there is a requirement for constant certificate rotation (we will straight away jump into how Istio enables mTLS and certificate rotation). How To Enable mTLS and Certificate Rotation Using Istio Service Mesh Istio service mesh is an infrastructure layer that abstracts out the network and security later out of application layers. It does so by injecting an Envoy proxy (an L4 and L7 sidecar proxy) into each application and listening to all the network communication. mTLS Implementation in Istio Though Istio supports multiple authentication types, it is best known for implementing mTLS to applications hosted over the cloud, on-prem, or Kubernetes infrastructure. The Envoy proxy acts as Policy Enforcement Points (PEP); you can implement mTLS using the peer-to-peer (p2p) authentication policy provided by Istio and enforce it through the proxies at the workload level. Example of p2p authentication policy in Istio to apply mTLS to demobank app in the istio-nm namespace: YAML apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: "mTLS-peer-policy" namespace: "istio-nm" spec: selector: matchLabels: app: demobank mtls: mode: STRICT The working mechanism of mTLS authentication in Istio is as follows: At first, all the outbound and inbound traffic to any application in the mesh is re-routed through the Envoy proxy. The mTLS happens between the client-side Envoy proxy and the server-side Envoy proxy. The client-side Envoy proxy would try to connect with the server-side Envoy proxy by exchanging certificates and proving their identity. Once the authentication phase is completed successfully, a TCP connection between the client and service side Envoy proxy is established to carry out encrypted communications. Note that mTLS with Istio can be implemented at all levels: application, namespace, or mesh-wide. Certificate Management and Rotation in Istio Service Mesh Istio provides a stronger identity by issuing X.509 certificates to Envoy proxies attached to applications. The certificate management and rotation are done by an Istio agent running in the same container as the Envoy proxy. The Istio agents talk to the Istiod- the control plane of Istio- to effectively circulate the digital certificates with public keys. Below are the details phases of certificate management in Istio: Istio agents generate public key pairs (private and public keys) and then send the public key to the Istio control plane for signing. This is called a certificate signing request (CSR). Istiod has a component (earlier Galley) that acts as the CA. Istiod validates the public key in the request, signs, and issues a digital certificate to the Istio agent. When mTLS connection is required, Envoy proxies fetch the certificate from the Istio agent using Envoy secret discovery service (SDS) API. The Istio agent observes the expiration of the certificate used by the Envoy. Upon the certificate's expiry, the agent initiates a CSR to Istiod. Network Security With Open-Source Istio Microservices architecture is the norm nowadays. The distributed nature of applications gives a high attack surface for intruders since these applications communicate with each other over a network. Security cannot be an afterthought in such a scenario as it can lead to catastrophic data breaches. Implementing mTLS with Istio is an effective way to secure communication between cloud-native applications. And many leading companies like Splunk, Airbnb, and Salesforce, use open-source Istio to enable mTLS and enhance the security of their applications.
This article is based on an interview with Kubernetes co-founder Brendan Burns who appeared on the Dev Interrupted podcast. The success of Kubernetes was never preordained — it took years of work. While today it has grown to be one of the largest and most popular open source projects and become the go-to API for building cloud-native applications worldwide — it almost wasn’t even open source. Since its start in 2014, Kubernetes (an open-source orchestrator for deploying containerized applications) has transformed from a tiny project co-founded by three people, Joe Beda, Craig McLuckie, and Brendan Burns, into a production-grade infrastructure that powers large-scale production applications in various fields, from machine learning to online services. Created with the intent to simplify the task of building, deploying, and maintaining distributed systems, it allows people the world over to achieve new levels of velocity, agility, and reliability. Over the years, Kubernetes has grown into a large, successful open-source community — but it was a long journey getting there. What Is a Container? Before jumping into the history of Kubernetes, let’s first define a “container” because it often has a broad set of meanings. Fundamentally, a container is a process of taking an application and packaging it, building a binary representation of the pieces that make up that application, like the software, configuration files, etc., and having a protocol for distributing it around the world. There are three pillars of what became known as containers: Packaging everything so it's exactly the same, for instance, on a laptop as it is out in the cloud The distribution of step 1 needs to occur easily and around the world Creating an environment where a bug in one component of the process doesn’t affect another component How It Started When we interviewed Kubernetes co-founder Brendan Burns on the Dev Interrupted podcast, he told us that as an engineer, he found himself gravitating towards spaces with opportunity. While working at Google, he noticed that Cloud had a lot of white space and thought it would be an attractive space to work in. At the time, he led a small team of about seven engineers and decided to transition his team over to Cloud. At the same time, his eventual Kubernetes co-founders Joe and Craig created Compute Engine, the virtual machine product for Google Cloud, and the three of them began to work together in Google’s Cloud division. While Joe and Craig focused on compute, Brendan worked in config management on things like CloudFormation and Terraform. Ideas were starting to coalesce, and the three of them were witnessing the challenges people faced trying to adopt Cloud — a far too difficult process. There were also some internal systems at Google, in particular a system called Borg, a cluster manager that’s still used today, that served as the initial inspiration for the three developers as they dreamt up Kubernetes. However, none of it would have been a reality without Docker. Docker Changes Everything As a prerequisite to the functionality of Kubernetes, people needed to be motivated to build container images and run containers. Docker was the startup that came along and convinced people why they should care about containment. All of a sudden, a use case existed, and an amazing opportunity presented itself. Docker didn’t have a lot of experience at scale, and they were focused on one machine, with a container and daemon on that machine — what they were lacking was orchestration. If a system was built that could act as a container-orchestrator, it represented not only a massive opportunity to change the market but to change the entire cloud landscape. When you think about what it takes to deploy an application up to the cloud or even to an on-premise collection of machines, it’s a lengthy process. It requires you to package up an application, distribute it, keep it running, have load-balanced traffic between the various pieces of the application, and an API to tie it all together. Prior to Kubernetes, some of these systems were in place, but nothing like today. Kubernetes is responsible for mainstreaming the idea of a distributed systems application environment for building and constructing distributed systems that span machines far and wide. With the need for orchestration realized, the next step was selling the idea to executives. Selling Open Source Convincing people that it was possible and a good idea was pretty straightforward. There were folks at Google who understood what Joe, Craig, and Brendan were trying to do. The real battle was fighting to make Kubernetes open source. As Brendan shared in our interview, they had a lot of internal arguments at Google about it being open source. Mostly it came down to money and control. From a business perspective, if a product or system is massively successful and you’re the only one who can sell it, then you’re in a great position. Conversely, Brendan told us that he always felt that Kubernetes would only be massively successful if it had an ecosystem, and the best way to foster an ecosystem was to make it open source. This viewpoint is centered around the community that comes together to build the software. An amazing community formed early on of people who helped build docs, who helped build tutorials, who would talk about their work at conferences, and then an ecosystem of companies that were betting their whole business on the success of Kubernetes. Startups began popping up, saying things like, “Well, the prerequisite for using my monitoring software is that you have a Kubernetes cluster.” All of the attention and goodwill formed a sort of virtuous cycle around Kubernetes. Success Has a Way of Looking Easy Soon enough, Kelsey Hightower, principal engineer for Google Cloud and Brendan’s co-author of the book Kubernetes: Up and Running: Dive into the Future of Infrastructure, came along and started doing a ton of evangelism and driving attention towards Kubernetes. It can be easy to look back and assume that it was easy because Kubernetes just took over. It's present in every major public cloud at this point. People expect it to be in new systems. But the truth is that in those early years, it took a lot of hard work to build and evangelize Kubernetes. Brendan shared with us that his hope for the future is that the bits of Kubernetes sort of fade into the background. It’ll be there, and it’ll be important, but it won’t be talked about or thought about from day to day because, as he puts it, “There’s so much more that needs to be built.”
You’ve probably heard that Generative AI has demonstrated the potential to disrupt a huge number of industries and jobs—and web3 is no exception. But how well can ChatGPT create smart contracts? Can using ChatGPT to code smart contracts make you a 10x developer? In this article, we’ll try it out and see. We’ll walk through writing and deploying an Ethereum smart contract using MetaMask, Infura, and Truffle - and we will ask ChatGPT for instructions on how to do everything from the code to the deployment. The goal is to see if we can break down the task of creating a smart contract into a set of simple prompts. Ideally, ChatGPT can parse those prompts and give good (and accurate) answers to them, making us a faster, and better, developer. Let’s see what we can do! A Note on Accuracy ChatGPT is a Large Language Model (LLM) extremely proficient in explaining blockchain concepts, providing advice, and even generating code snippets related to web3 development. However, keep in mind that some of the code or answers it generates may be partially or completely incorrect. Access to ChatGPT does not remove the need to know the basics of blockchain development. However, even a beginner can use artificial intelligence to dramatically increase productivity. The second point to note is that the free, public version of ChatGPT isn’t very aware of events, updates, and practices that have emerged after 2021. This can be an issue in a field such as web3 development, where everything is improving and changing at a breakneck pace. However, it is possible to get good answers if something that you’re asking is fundamental and has been stable over the years (for instance, Truffle and Solidity). Create an Escrow Contract With Truffle and ChatGPT To see what ChatGPT can do, we’re going to ask it to create an escrow smart contract that allows a buyer and a seller to exchange money, and we’ll have ChatGPT establish an arbiter to resolve disputes. Let’s see what Generative AI can do! Step 1: Install MetaMask First, we need to set up a wallet. We’ll use the MetaMask wallet—the most popular Ethereum option—and add the Sepolia test network. MetaMask is secure and easy to use. You can download the MetaMask extension for your browser here. Then, just install and set up the wallet. You’ll be given a “secret phrase”—keep this private and safe! Never store it anywhere public. Note: If you need background or a primer on web3, check out this previous article I wrote on learning web3/smart contracts/Solidity. Next, click on the “Network” tab in the top-right. Click on the option to show test networks. You should now see the Sepolia test network in the dropdown. Select that. Step 2: Get SepoliaETH To deploy and interact with a smart contract, you need ETH. Since we are using the Sepolia test network, we’ll need to get some Sepolia test ETH. Don’t worry—it’s free! You can obtain free SepoliaETH from Infura’s recently released faucet here. This faucet gives up to 0.5 SepoliaETH each day for free. Step 3: Install Node and NPM In order to build an escrow system using Truffle, we will need Node and NPM. To check to see if it’s installed, run: $ node -v You should see the Node version. Step 4: Sign Up for Infura Infura is an RPC provider. They provide easy API-based access to everything we need to connect to—and work with—the Ethereum blockchain (and quite a few others). Sign up for a free account. Then go to the dashboard and select Create New Key. Choose Web3 API as the network and name the new key Escrow (though you can name it whatever you want). Click Create, and Infura will generate an API key. This key is what you need to access your RPC endpoint. For our purposes, we just want the Sepolia endpoint (https://sepolia.infura.io/v3/<your API key>). Step 5: Create the Node Project Next, we need to create our Node project and then complete our setup with any necessary packages. $ mkdir escrow && cd escrow $ npm init -y To deploy and test our contract, we’ll use Truffle. Truffle is a suite of tools that give you everything you need to develop, test, and deploy smart contracts. Let’s install it: $ npm install —save truffle You can create a project skeleton using Truffle by running: $ npx truffle init To check that everything ran correctly: $ npx truffle test We now have Truffle successfully configured. Let’s next install the OpenZeppelin contracts package. This package will give us access to a few more helpful functionalities that we may require. $ npm install @openzeppelin/contracts We’re almost done with the setup! Next, we have to allow MetaMask and Truffle to work together. We need Truffle to be able to sign transactions, pay gas, etc. For this, we’ll use hdwalletprovider. Install it: $ npm install @truffle/hdwallet-provider And finally, let’s use dotenv to keep our wallet information secure. $ npm install dotenv Step 6: Sign Up for ChatGPT We won’t be writing the escrow smart contract code from scratch. Instead, we’ll see what ChatGPT can do. In order to use ChatGPT, sign up for a free account with OpenAI. Once you’ve signed up for an account, visit this page. You should see an interface that looks something like this: Step 7: Create the Escrow Contract Next, we’ll have ChatGPT create an extremely simple escrow contract that allows a buyer and a seller to exchange a certain amount of money, and an arbiter to resolve disputes. Let’s use the following prompt in ChatGPT. As an expert smart contract developer, create a smart contract using Solidity that implements an escrow. There will be three parties: the buyer, the seller, and the arbiter. The constructor will set the buyer, seller, and arbiter. It will also set the value which will be equal to the amount sent to the constructor. There will be two functions. The first function allows the buyer or the arbiter to release funds to the seller. The second function allows the buyer or the arbiter to refund the amount back to the buyer. Implement any other functions that you may think are relevant. Note the “As an expert smart contract developer” beginning of the prompt. This type of “role-playing” with ChatGPT works very well. Now ChatGPT will generate a unique response for you. And this response will change every time. So you may not get the same output that is shown below, but it should be something similar. Sure, here's an example implementation of an escrow smart contract in Solidity: ``` pragma solidity ^0.8.0; contract Escrow { address payable public buyer; address payable public seller; address public arbiter; uint public value; bool public released; bool public refunded; constructor(address payable _buyer, address payable _seller, address _arbiter) { buyer = _buyer; seller = _seller; arbiter = _arbiter; value = msg.value; } function release() public { require(!released, "Funds have already been released."); require(msg.sender == buyer || msg.sender == arbiter, "Only buyer or arbiter can release funds."); released = true; seller.transfer(value); } function refund() public { require(!refunded, "Funds have already been refunded."); require(msg.sender == buyer || msg.sender == arbiter, "Only buyer or arbiter can refund funds."); refunded = true; buyer.transfer(value); } function getBalance() public view returns (uint) { return address(this).balance; } function setArbiter(address _arbiter) public { require(msg.sender == arbiter, "Only arbiter can set arbiter."); arbiter = _arbiter; } function setSeller(address payable _seller) public { require(msg.sender == seller, "Only seller can set seller."); seller = _seller; } } ``` The `Escrow` contract has a constructor that takes three arguments: the addresses of the buyer, seller, and arbiter, and sets the `value` equal to the amount sent to the constructor. The `release()` function allows the buyer or the arbiter to release the funds to the seller. It first checks that the funds haven't already been released, and that the caller is either the buyer or the arbiter. If both conditions are met, the `released` variable is set to `true`, and the `value` is transferred to the seller's address. The `refund()` function allows the buyer or the arbiter to refund the funds back to the buyer. It first checks that the funds haven't already been refunded, and that the caller is either the buyer or the arbiter. If both conditions are met, the `refunded` variable is set to `true`, and the `value` is transferred back to the buyer's address. The `getBalance()` function returns the balance of the contract. The `setArbiter()` function allows the arbiter to set a new arbiter. The `setSeller()` function allows the seller to set a new seller. Note that this is a very basic implementation of an escrow contract and should not be used in production without thorough testing and auditing. Also, there may be additional features that you may want to add to suit your specific use case. Upon inspection, the snippet provided by ChatGPT is pretty good! But it does have an error—the constructor is not marked payable even though we’re accessing msg.value and expecting Ether to be sent to it. Again, an important note: Never expect ChatGPT to give you code that is 100% correct. More often than not, you will have to make small corrections like we did here. Also, notice that ChatGPT uses an earlier version of Solidity, and it uses a practice of fund transfer that is not considered best practice anymore. For the purpose of this tutorial, we’ll just let it be. In the contracts folder of your project, create a new file called Escrow.sol and add the code provided by ChatGPT with the corrections. // SPDX-License-Identifier: MIT pragma solidity ^0.8.0; contract Escrow { address payable public buyer; address payable public seller; address public arbiter; uint public value; bool public released; bool public refunded; constructor(address payable _buyer, address payable _seller, address _arbiter) payable { buyer = _buyer; seller = _seller; arbiter = _arbiter; value = msg.value; } function release() public { require(!released, "Funds have already been released."); require(msg.sender == buyer || msg.sender == arbiter, "Only buyer or arbiter can release funds."); released = true; seller.transfer(value); } function refund() public { require(!refunded, "Funds have already been refunded."); require(msg.sender == buyer || msg.sender == arbiter, "Only buyer or arbiter can refund funds."); refunded = true; buyer.transfer(value); } function getBalance() public view returns (uint) { return address(this).balance; } function setArbiter(address _arbiter) public { require(msg.sender == arbiter, "Only arbiter can set arbiter."); arbiter = _arbiter; } function setSeller(address payable _seller) public { require(msg.sender == seller, "Only seller can set seller."); seller = _seller; } } Make sure the contract is compiling correctly by running: $ npx truffle compile Step 8: Update Your Truffle Config Next, we’ll do a little configuring without ChatGPT. This is a standard setup and will depend on the environment we’re using. Create a new .env file in your project’s root directory and add the following: INFURA_API_KEY = "https://sepolia.infura.io/v3/<insert API key here>" MNEMONIC = "<insert metamask secret phrase here>" Now let’s configure the Infura endpoint. In truffle.config.js replace the existing code with this standard config: require('dotenv').config(); const HDWalletProvider = require('@truffle/hdwallet-provider'); const {INFURA_API_KEY, MNEMONIC} = process.env; module.exports = { networks: { development: { host: "127.0.0.1", port: 8545, network_id: "*" }, sepolia: { provider: () => new HDWalletProvider(MNEMONIC, INFURA_API_KEY), network_id: '11155111', } } }; Step 9: Deploy the Contract Now it’s time to deploy! Instead of giving you the script to deploy our contract using Truffle, let’s ask ChatGPT for the relevant instructions. In the same window that you used to ask for the escrow contract code, we’ll enter the following prompt: Give me a migration script that deploys the Escrow contract using Truffle. I need to send 0.001 ETH to the constructor during deployment. And I need the address that deploys the contract to be the buyer. We get this response: Sure, here's an example migration script that deploys the `Escrow` contract and sends 0.001 ETH to the constructor during deployment. In this script, the address that deploys the contract will be set as the buyer. 1. Create a new JavaScript file named `2_deploy_escrow.js` (the number at the beginning indicates the order in which the migrations will be executed). 2. Copy the following code into `2_deploy_escrow.js`: ``` const Escrow = artifacts.require("Escrow"); module.exports = async function(deployer, network, accounts) { const buyer = accounts[0]; // set the buyer as the address that deploys the contract const seller = "0x2222222222222222222222222222222222222222"; // replace with seller address const arbiter = "0x3333333333333333333333333333333333333333"; // replace with arbiter address const value = web3.utils.toWei('0.001', 'ether'); // change to the value you want to send await deployer.deploy(Escrow, seller, arbiter, { from: buyer, value: value }); }; ``` 3. Replace the seller and arbiter addresses with the actual Ethereum addresses of the parties involved in the escrow transaction. 4. Save the file. 5. Make sure you have a network configured in your `truffle-config.js` file. If you want to deploy to a testnet, you'll need to configure your Truffle project to use the network you want to deploy to. 6. Run `truffle migrate --network <network-name>` to deploy the contract to the specified network. Replace `<network-name>` with the name of the network you want to deploy to, e.g. `ropsten` for the Ropsten testnet. This will deploy the contract and send 0.001 ETH to the constructor with the address that deploys the contract set as the buyer. Once again, the instructions provided by ChatGPT are mostly correct. It gets the naming wrong (we want it to be prefixed with 1_ and not 2_) and makes a minor mistake in setting the arguments of the deploy function. Let’s create a new 1_deploy_contract.js file in the migrations folder. Add the following, slightly modified code. const Escrow = artifacts.require("Escrow"); module.exports = async function (deployer, network, accounts) { const buyer = accounts[0]; // set the buyer as the address that deploys the contract const seller = accounts[1]; // replace with seller address const arbiter = accounts[2]; // replace with arbiter address const value = web3.utils.toWei('0.001', 'ether'); // change to the value you want to send await deployer.deploy(Escrow, seller, arbiter, { from: buyer, value: value }); }; And finally, let’s deploy our contract: $ truffle migrate --network sepolia You should see something similar to this: Compiling your contracts... =========================== > Everything is up to date, there is nothing to compile. Migrations dry-run (simulation) =============================== > Network name: 'sepolia-fork' > Network id: 11155111 > Block gas limit: 30000000 (0x1c9c380) 1_deploy_contract.js ==================== Deploying 'Escrow' ------------------ > block number: 3400252 > block timestamp: 1682974543 > account: 0xc361Fc33b99F88612257ac8cC2d852A5CEe0E217 > balance: 0.506876109994053108 > gas used: 849556 (0xcf694) > gas price: 2.500000007 gwei > value sent: 0.001 ETH > total cost: 0.003123890005946892 ETH ------------------------------------- > Total cost: 0.003123890005946892 ETH Summary ======= > Total deployments: 1 > Final cost: 0.003123890005946892 ETH Starting migrations... ====================== > Network name: 'sepolia' > Network id: 11155111 > Block gas limit: 30000000 (0x1c9c380) 1_deploy_contract.js ==================== Deploying 'Escrow' ------------------ > transaction hash: 0x7a7804d92a6b6e805991eeb8249d79a2b8c5da43cffe633a31f987e9fe596654 > Blocks: 1 Seconds: 33 > contract address: 0x64ccE52898F5d61380D2Ec8C02F2EF16F28436de > block number: 3400258 > block timestamp: 1682974584 > account: 0xc361Fc33b99F88612257ac8cC2d852A5CEe0E217 > balance: 0.506876109994053108 > gas used: 849556 (0xcf694) > gas price: 2.500000007 gwei > value sent: 0.001 ETH > total cost: 0.003123890005946892 ETH > Saving artifacts ------------------------------------- > Total cost: 0.003123890005946892 ETH Summary ======= > Total deployments: 1 > Final cost: 0.003123890005946892 ETH To validate deployment, you can find your contract’s address on Sepolia Etherscan: Congratulations! You’ve deployed your contract. And you did this using ChatGPT to generate more than 95% of your code. Tips and Tricks Here are a few tips and tricks for using ChatGPT when coding: ChatGPT has limited knowledge and has a knowledge-base cutoff of September 2021. So some of those best practices and the latest developments in smart contracts are going to be missing from its knowledge. There are several versions of ChatGPT. GPT-4 tends to give better answers but is a paid service. GPT-3.5 (which we used here) is free, and does pretty well. Make your questions as specific as possible. Don’t stop with writing code. ChatGTP can debug your code, audit it, document it, make it pretty, analyze it, find vulnerabilities, and more. Conclusion ChatGPT was an extremely useful co-developer. But it wasn’t perfect. I wouldn’t use it to create unmodified code that I pushed to mainnet—especially if I was a new developer. But as a coding partner, and as a place to start, it was helpful. It saved me significant time and created surprisingly good code. In the hands of an experienced developer, I believe it can increase productivity by many times. I plan on using it quite heavily on future projects, alongside MetaMask, Infura, and Truffle. Have a really great day!
Direct image upload processes create a highly efficient path between client-side users and a website’s underlying file storage instances, significantly benefiting both ends of the client/web service relationship. Due largely to the ever-increasing availability (and affordability) of cloud storage resources for independent developer projects and small business ventures, we see the option to upload our own image files more and more frequently everywhere we look online, growing in tandem with steady demand for new forms of social engagement and commerce. The trouble is, however, that file upload security is a very serious issue - and image files are rather easily exploited by client-side threat actors. Image files aren’t unique in this respect, of course (many common file formats including PDF, DOCX, etc., for example, can house a variety of hidden threats), but their monumental value on the internet – a mostly visual platform – sets them apart as one of the more expedient vessels for malicious content. Attackers can easily inject malware and other malicious code directly into image files using honed steganographic techniques, reliably avoiding detection from poorly configured upload security policies. Malware can be hidden in several different ways within an image file – bluntly appended to the end of a file, subtly incorporated through minor code changes, or even concealed in the image’s metadata or EXIF data. Malicious code is generally designed to execute remotely or upon file opening, meaning dormant, undetected threats in storage can wait days, weeks, or even months before suddenly unleashing dangerous content. It isn’t just the website’s system they can exploit, too: if an unsuspecting client-side user downloads an infected file, their device can be quickly compromised, badly (perhaps permanently) damaging the website’s reputation. Mitigating image file upload threats starts with implementing powerful virus and malware detection policies, and it also involves putting sensible file upload validation measures in place. Unusually large image files, for example, might indicate a hidden threat, so understanding (and possibly standardizing) the size of image uploads can help facilitate quicker threat detection. Moreover, limiting the number of different file extensions allowed for upload (for example, limiting to PNG or JPG) makes file extension validation easier and more efficient to carry out. File extensions and headers shouldn’t be trusted blindly, either – thorough content verification should always take the file structure and file encoding into consideration. Demonstration In the remainder of this article, I’ll demonstrate two simple, free-to-use solutions which can help virus scan and validate image file uploads prior to reaching cloud storage. Both can be taken advantage of efficiently using complementary, ready-to-run Java code examples to structure your API calls. These APIs perform the following functions respectively: Scan image files for viruses Validate image files Used in conjunction with one another, both APIs can help ensure image uploads are valid and free of viruses and malware, significantly mitigating the risks associated with direct image file uploads. Scan an Image File for Viruses This API is equipped with more than 17 million virus and malware signatures, covering extremely common threats like trojans, ransomware, and spyware among others. It isn’t limited to image files either (you can also scan documents like PDF, DOCX, XLSX, etc.), so it offers some versatility if your file upload process accepts multiple file types. All scanned files will ultimately receive a "CleanResult: True" or "CleanResult: False" Boolean response; if false, the name of the detected virus will be provided in the API response. To install the client SDK, first, add a reference to the repository in your Maven POM File. Jitpack is used to dynamically compile the library: XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> After that, add a reference to the dependency: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> With installation out of the way, you can structure your API call using the following complementary code examples: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ScanApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ScanApi apiInstance = new ScanApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. try { VirusScanResult result = apiInstance.scanFile(inputFile); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ScanApi#scanFile"); e.printStackTrace(); } When testing this solution, I’d recommend thoroughly researching options for inert files that can safely trigger a "CleanResult: False" response (Eicar files, for example, are often a popular choice in this regard). Validate an Image File This API is designed to rigorously validate dozens of common input image types, including JPG, PNG, WEBP, GIF, and many more. It’ll identify whether the content contained within an image upload matches its extension, whether the file is password protected, and if there are any errors and warnings present within the file. If any errors are detected, the API response will provide a description of the error, a path to the error, and a URI for reference. You can install this client SDK the same way as before. Add this reference to your Maven POM file repository: XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> Then add a reference to the dependency: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> Finally, you can structure your API call using the ready-to-run code examples below: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.ValidateDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); ValidateDocumentApi apiInstance = new ValidateDocumentApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. try { DocumentValidationResult result = apiInstance.validateDocumentImageValidation(inputFile); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling ValidateDocumentApi#validateDocumentImageValidation"); e.printStackTrace(); }
Apache Kafka and Apache Flink are increasingly joining forces to build innovative real-time stream processing applications. This blog post explores the benefits of combining both open-source frameworks, shows unique differentiators of Flink versus Kafka, and discusses when to use a Kafka-native streaming engine like Kafka Streams instead of Flink. The Tremendous Adoption of Apache Kafka and Apache Flink Apache Kafka became the de facto standard for data streaming. The core of Kafka is messaging at any scale in combination with a distributed storage (= commit log) for reliable durability, decoupling of applications, and replayability of historical data. Kafka also includes a stream processing engine with Kafka Streams. And KSQL is another successful Kafka-native streaming SQL engine built on top of Kafka Streams. Both are fantastic tools. In parallel, Apache Flink became a very successful stream-processing engine. The first prominent Kafka + Flink case study I remember is the fraud detection use case of ING Bank. The first publications came up in 2017, i.e., over five years ago: "StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Kafka and Apache Flink." This is just one of many Kafka fraud detection case studies. One of the last case studies I blogged about goes in the same direction: "Why DoorDash migrated from Cloud-native Amazon SQS and Kinesis to Apache Kafka and Flink." The adoption of Kafka is already outstanding. And Flink gets into enterprises more and more, very often in combination with Kafka. This article is no introduction to Apache Kafka or Apache Flink. Instead, I explore why these two technologies are a perfect match for many use cases and when other Kafka-native tools are the appropriate choice instead of Flink. Top Reasons Apache Flink Is a Perfect Complementary Technology for Kafka Stream processing is a paradigm that continuously correlates events of one or more data sources. Data is processed in motion, in contrast to traditional processing at rest with a database and request-response API (e.g., a web service or a SQL query). Stream processing is either stateless (e.g., filter or transform a single message) or stateful (e.g., an aggregation or sliding window). Especially state management is very challenging in a distributed stream processing application. A vital advantage of the Apache Flink engine is its efficiency in stateful applications. Flink has expressive APIs, advanced operators, and low-level control. But Flink is also scalable in stateful applications, even for relatively complex streaming JOIN queries. Flink's scalable and flexible engine is fundamental to providing a tremendous stream processing framework for big data workloads. But there is more. The following aspects are my favorite features and design principles of Apache Flink: Unified streaming and batch APIs Connectivity to one or multiple Kafka clusters Transactions across Kafka and Flink Complex Event Processing Standard SQL support Machine Learning with Kafka, Flink, and Python But keep in mind that every design approach has pros and cons. While there are a lot of advantages, sometimes it is also a drawback. Unified Streaming and Batch APIs Apache Flink's DataStream API unifies batch and streaming APIs. It supports different runtime execution modes for stream processing and batch processing, from which you can choose the right one for your use case and the characteristics of your job. In the case of SQL/Table API, the switch happens automatically based on the characteristics of the sources: all bounded events go into batch execution mode; at least one unbounded event means STREAMING execution mode. The unification of streaming and batch brings a lot of advantages: Reuse of logic/code for real-time and historical processing Consistent semantics across stream and batch processing A single system to operate Applications mixing historical and real-time data processing This sounds similar to Apache Spark. But there is a significant difference: Contrary to Spark, the foundation of Flink is data streaming, not batch processing. Hence, streaming is the default execution runtime mode in Apache Flink. Continuous stateless or stateful processing enables real-time streaming analytics using an unbounded stream of events. Batch execution is more efficient for bounded jobs (i.e., a bounded subset of a stream) for which you have a known fixed input and which do not run continuously. This executes jobs in a way that is more reminiscent of batch processing frameworks, such as MapReduce in the Hadoop and Spark ecosystems. Apache Flink makes moving from a Lambda to Kappa enterprise architecture easier. The foundation of the architecture is real-time, with Kafka as its heart. But batch processing is still possible out-of-the-box with Kafka and Flink using consistent semantics. Though, this combination will likely not (try to) replace traditional ETL batch tools, e.g., for a one-time lift-and-shift migration of large workloads. Connectivity to One or Multiple Kafka Clusters Apache Flink is a separate infrastructure from the Kafka cluster. This has various pros and cons. First, I often emphasize the vast benefit of Kafka-native applications: you only need to operate, scale, and support one infrastructure for end-to-end data processing. A second infrastructure adds additional complexity, cost, and risk. However, imagine a cloud vendor taking over that burden, so you consume the end-to-end pipeline as a single cloud service. With that in mind, let's look at a few benefits of separate clusters for the data hub (Kafka) and the stream processing engine (Flink): Focus on data processing in a separate infrastructure with dedicated APIs and features independent of the data streaming platform. More efficient streaming pipelines before hitting the Kafka Topics again; the data exchange happens directly between the Flink workers. Data processing across different Kafka topics of independent Kafka clusters of different business units. If it makes sense from a technical and organizational perspective, you can connect directly to non-Kafka sources and sinks. But be careful, this can quickly become an anti-pattern in the enterprise architecture and create complex and unmanageable "spaghetti integrations". Implement new fail-over strategies for applications. I emphasize Flink is usually NOT the recommended choice for implementing your aggregation, migration, or hybrid integration scenario. Multiple Kafka clusters for hybrid and global architectures are the norm, not an exception. Flink does not change these architectures. Kafka-native replication tools like MirrorMaker 2 or Confluent Cluster Linking are still the right choice for disaster recovery. It is still easier to do such a scenario with just one technology. Tools like Cluster Linking solve challenges like offset management out-of-the-box. Transactions Across Kafka and Flink Workloads for analytics and transactions have very unlike characteristics and requirements. The use cases differ significantly. SLAs are very different, too. Many people think that data streaming is not built for transactions and should only be used for big data analytics. However, Apache Kafka and Apache Flink are deployed in many resilient, mission-critical architectures. The concept of exactly-once semantics (EOS) allows stream processing applications to process data through Kafka without loss or duplication. This ensures that computed results are always accurate. Transactions are possible across Kafka and Flink. The feature is mature and battle-tested in production. Operating separate clusters is still challenging for transactional workloads. However, a cloud service can take over this risk and burden. Many companies already use EOS in production with Kafka Streams. But EOS can even be used if you combine Kafka and Flink. That is a massive benefit if you choose Flink for transactional workloads. So, to be clear: EOS is not a differentiator in Flink (vs. Kafka Streams), but it is an excellent option to use EOS across Kafka and Flink, too. Complex Event Processing With FlinkCEP The goal of complex event processing (CEP) is to identify meaningful events in real-time situations and respond to them as quickly as possible. CEP does usually not send continuous events to other systems but detects when something significant occurs. A common use case for CEP is handling late-arriving events or the non-occurrence of events. The big difference between CEP and event stream processing (ESP) is that CEP generates new events to trigger action based on situations it detects across multiple event streams with events of different types (situations that build up over time and space). ESP detects patterns over event streams with homogenous events (i.e. patterns over time). Pattern matching is a technique to implement either pattern but the features look different. FlinkCEP is an add-on for Flink to do complex event processing. The powerful pattern API of FlinkCEP allows you to define complex pattern sequences you want to extract from your input stream. After specifying the pattern sequence, you apply them to the input stream to detect potential matches. This is also possible with SQL via the MATCH_RECOGNIZE clause. Standard SQL Support Structured Query Language (SQL) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS). However, it is so predominant that other technologies like non-relational databases (NoSQL) and streaming platforms adopt it, too. SQL became a standard of the American National Standards Institute (ANSI) in 1986 and the International Organization for Standardization (ISO) in 1987. Hence, if a tool supports ANSI SQL, it ensures that any 3rd party tool can easily integrate using standard SQL queries (at least in theory). Apache Flink supports ANSI SQL, including the Data Definition Language (DDL), Data Manipulation Language (DML), and Query Language. Flink’s SQL support is based on Apache Calcite, which implements the SQL standard. This is great because many personas, including developers, architects, and business analysts, already use SQL in their daily job. The SQL integration is based on the so-called Flink SQL Gateway, which is part of the Flink framework allowing other applications to interact with a Flink cluster through a REST API easily. User applications (e.g., Java/Python/Shell program, Postman) can use the REST API to submit queries, cancel jobs, retrieve results, etc. This enables a possible integration of Flink SQL with traditional business intelligence tools like Tableau, Microsoft Power BI, or Qlik. However, to be clear, ANSI SQL was not built for stream processing. Incorporating Streaming SQL functionality into the official SQL standard is still in the works. The Streaming SQL working group includes database vendors like Microsoft, Oracle, and IBM, cloud vendors like Google and Alibaba, and data streaming vendors like Confluent. More details: "The History and Future of SQL: Databases Meet Stream Processing". Having said this, Flink supports continuous sliding windows and various streaming joins via ANSI SQL. There are things that require additional non-standard SQL keywords but continuous sliding windows or streaming joins, in general, are possible. Machine Learning with Kafka, Flink, and Python In conjunction with data streaming, machine learning solves the impedance mismatch of reliably bringing analytic models into production for real-time scoring at any scale. I explored ML deployments within Kafka applications in various blog posts, e.g., embedded models in Kafka Streams applications or using a machine learning model server with streaming capabilities like Seldon. PyFlink is a Python API for Apache Flink that allows you to build scalable batch and streaming workloads, such as real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines, and ETL processes. If you’re already familiar with Python and libraries such as Pandas, then PyFlink makes it simpler to leverage the full capabilities of the Flink ecosystem. PyFlink is the missing piece for an ML-powered data streaming infrastructure, as almost every data engineer uses Python. The combination of Tiered Storage in Kafka and Data Streaming with Flink in Python is excellent for model training without the need for a separate data lake. When To Use Kafka Streams Instead of Apache Flink Don't underestimate the power and use cases of Kafka-native stream processing with Kafka Streams. The adoption rate is massive, as Kafka Streams is easy to use. And it is part of Apache Kafka. To be clear: Kafka Streams is already included if you download Kafka from the Apache website. Kafka Streams Is a Library, Apache Flink Is a Cluster The most significant difference between Kafka Streams and Apache Flink is that Kafka Streams is a Java library, while Flink is a separate cluster infrastructure. Developers can deploy the Flink infrastructure in session mode for bigger workloads (e.g., many small, homogenous workloads like SQL queries) or application mode for fewer bigger, heterogeneous data processing tasks (e.g., isolated applications running in a Kubernetes cluster). No matter your deployment option, you still need to operate a complex cluster infrastructure for Flink (including separate metadata management on a ZooKeeper cluster or an etcd cluster in a Kubernetes environment). TL;DR: Apache Flink is a fantastic stream processing framework and a top #5 Apache open-source project. But it is also complex to deploy and difficult to manage. Benefits of Using the Lightweight Library of Kafka Streams Kafka Streams is a single Java library. This adds a few benefits: Kafka-native integration supports critical SLAs and low latency for end-to-end data pipelines and applications with a single cluster infrastructure instead of operating separate messaging and processing engines with Kafka and Flink. Kafka Streams apps still run in their VMs or Kubernetes containers, but high availability and persistence are guaranteed via Kafka Topics. Very lightweight with no other dependencies (Flink needs S3 or similar storage as the state backend) Easy integration into testing/CI/DevOps pipelines Embedded stream processing into any existing JVM application, like a lightweight Spring Boot app or a legacy monolith built with old Java EE technologies like EJB. Interactive Queries allow leveraging the state of your application from outside your application. The Kafka Streams API enables your applications to be queryable. Flink's similar feature "queryable state" is approaching the end of its life due to a lack of maintainers. Kafka Streams is well-known for building independent, decoupled, lightweight microservices. This differs from submitting a processing job into the Flink (or Spark) cluster; each data product team controls its destiny (e.g., don’t depend on the central Flink team for upgrades or get forced to upgrade). Flink's application mode enables a similar deployment style for microservices. But: Kafka Streams and Apache Flink Live In Different Parts of a Company Today, Kafka Streams and Flink are usually used for different applications. While Flink provides an application mode to build microservices, most people use Kafka Streams for this today. Interactive queries are available in Kafka Streams and Flink, but it got deprecated in Flink as there is not much demand from the community. These are two examples that show that there is no clear winner. Sometimes Flink is the better choice, and sometimes Kafka Streams makes more sense. "In summary, while there certainly is an overlap between the Streams API in Kafka and Flink, they live in different parts of a company, largely due to differences in their architecture and thus we see them as complementary systems." That's the quote of a "Kafka Streams vs. Flink comparison" article written in 2016 (!) by Stephan Ewen, former CTO of Data Artisans, and Neha Narkhede, former CTO of Confluent. While some details changed over time, this old blog post is still pretty accurate today and a good read for a more technical audience. The domain-specific language (DSL) of Kafka Streams differs from Flink but is also very similar. How are both characteristics possible? It depends on who you ask. This (legitimate) subject for debate often segregates Kafka Streams and Flink communities. Kafka Streams has Stream and Table APIs. Flink has DataStream, Table, and SQL API. I guess 95% of use cases can be built with both technologies. APIs, infrastructure, experience, history, and many other factors are relevant for choosing the proper stream processing framework. Some architectural aspects are very different in Kafka Streams and Flink. These need to be understood and can be a pro or con for your use case. For instance, Flink's checkpointing has the advantage of getting a consistent snapshot, but the disadvantage is that every local error always stops the whole job and everything has to be rolled back to the last checkpoint. Kafka Streams does not have this concept. Local errors can be recovered locally (move the corresponding tasks somewhere else; the task/threads without errors just continue normally). Another example is Kafka Streams' hot standby for high availability versus Flink's fault-tolerant checkpointing system. Kafka + Flink = A Powerful Combination for Stream Processing Apache Kafka is the de facto standard for data streaming. It includes Kafka Streams, a widely used Java library for stream processing. Apache Flink is an independent and successful open-source project offering a stream processing engine for real-time and batch workloads. The combination of Kafka (including Kafka Streams) and Flink is already widespread in enterprises across all industries. Both Kafka Streams and Flink have benefits and tradeoffs for stream processing. The freedom of choice of these two leading open-source technologies and the tight integration of Kafka with Flink enables any kind of stream processing use case. This includes hybrid, global, and multi-cloud deployments, mission-critical transactional workloads, and powerful analytics with embedded machine learning. As always, understand the different options and choose the right tool for your use case and requirements. What is your favorite for streaming processing, Kafka Streams, Apache Flink, or another open-source or proprietary engine? In which use cases do you leverage stream processing? Let’s connect on LinkedIn and discuss it!
TL; DR: Life Is a Negotiation; Why Would Scrum Be Different? Life is a negotiation. Why would Scrum be different, particularly given its egalitarian nature? As you may recall, no one on a Scrum team can tell anyone else what to do, how to do it, or when to do it. Instead, solving your customers’ problems in a complex environment requires communication skills, empathy, patience, diplomacy, and professionalism. So let’s have a look at some typical agile negotiation scenarios. Agile Negotiation Levels For Scrum to work well, it’s essential that the Scrum Team and stakeholders continuously discuss how to align their objectives, expectations, practices, and principles. These conversations guarantee the team can deliver customer value within the given constraints while contributing to the organization’s sustainability. While the Scrum Guide mentions several examples of this agile negotiation, many others derive from practicing agility in established organizations. There are different agile negotiation scenarios, the most prominent being the team-internal and team-stakeholder levels: Examples of Team-Internal Negotiations These areas cover the practical work of a Scrum team, from Product Backlog refinement to the Sprint to the Retrospective. As a result, the lists are far from comprehensive. However, they should allow for the discovery of additional scenarios to help prepare for them: The Product Backlog and Refinement Level Scope negotiation: The Product Owner, team members, and stakeholders discuss the project’s or product’s scope, negotiating any adjustments or changes that might be necessary. Product Backlog refinement: The Product Owner and Developers collaborate to refine, estimate, and order the Product Backlog items based on value, risk, and dependencies. Balancing technical debt and new features: The Scrum team needs to negotiate how to balance addressing technical debt while delivering new features, considering quality, maintainability, and customer and business needs, respectively. Acceptance Criteria: The Product Owner, Developers, and probably stakeholders negotiate the specific requirements that need to be satisfied for a Product Backlog item to be accepted. Estimating Effort: The Developers may negotiate the effort required to complete each Product Backlog item using techniques such as Planning Poker, T-Shirt Sizing, or the Bucket System. The Sprint Planning Level Sprint Goal: The Scrum team members align on a Sprint Goal, defining what the team aims to achieve based on the current Sprint’s business objectives. Technical decisions: Developers negotiate architectural choices, design patterns, and code practices to implement the best technical solutions. Allocating tasks and responsibilities: Developers negotiate the allocation of tasks and responsibilities based on their skills, expertise, and capacity among themselves. The Sprint Level Resolving conflicts and issues: Disagreements and conflicts may arise during the Sprint, requiring team members to negotiate and find solutions. Clarification of requirements: Developers may need further clarification or details on Product Backlog items from the Sprint Backlog. They might negotiate with the Product Owner to refine the acceptance criteria or other specifications to ensure a clear understanding of what they need to deliver. Changes in priority: Unforeseen events or changing business needs might lead to the Product Owner reevaluating the priority of certain Product Backlog items during the Sprint. The Developers and the Product Owner have to negotiate whether the team can accommodate the changes given the current Sprint Goal or whether they should be deferred to a future Sprint. On rare occasions, the outcome of this discussion may be a Sprint cancellation. Scope adjustments: Developers may discover that a Product Backlog item is more complex than initially estimated, requiring additional effort or time. The Developers and the Product Owner may need to negotiate a scope adjustment, such as deferring part of the work to a future Sprint. Technical decisions and trade-offs: Developers may encounter technical challenges or constraints that require them to make trade-offs between different solutions. They may need to negotiate with the Product Owner to align on the best approach, considering cost, time, maintainability, and performance factors. Impediments and blockers: Developers may encounter impediments or blockers during the Sprint that impact their ability to complete their work. They may need to negotiate with the Product Owner to find solutions, for example, reprioritizing tasks. Release Planning: The Scrum team members need to negotiate what the team will release to whom and when. The Sprint Review Level Sprint Review: During Sprint Review, the Scrum team and stakeholders review the work completed during the Sprint, discussing any potential improvements or changes to the product. Prioritization of feedback: Stakeholders may provide feedback on the Increment(s), and the Scrum team may need to negotiate the priority and urgency of addressing this feedback. This discussion could involve adding new Product Backlog items, modifying existing items, or ordering the backlog in alignment with the Product Goal. Timeline and release expectations: Stakeholders might have expectations about when certain features or capabilities will be available in the product. Consequently, the Scrum team may need to negotiate changes to their release timeline. Risk mitigation: Stakeholders might identify new risks or raise concerns about existing risks during the Sprint Review. The team and stakeholders will need to negotiate strategies for mitigating these risks and balancing them against other priorities. The Sprint Retrospective Level Reflecting on process improvements: During Sprint Retrospectives, the Scrum team discusses and negotiates potential process improvements and experiments for the next Sprint, aiming to optimize their way of working. Balancing improvement actions with Sprint work: The team may need to negotiate how much time and effort they can allocate to implementing improvement actions during the next Sprint, considering their other commitments and priorities. Prioritizing improvement actions: The Scrum team may identify several potential actions to address the improvement areas. They will need to negotiate and prioritize these actions based on impact, effort, and dependencies. Assigning responsibility and ownership: The team members may need to negotiate who will implement specific improvement actions and ensure their completion during the upcoming Sprints, a directly responsible individual (DRI). Reviewing past decisions and agreements: The team may revisit decisions or agreements made in previous retrospectives, evaluating their effectiveness and discussing whether they should be adjusted or maintained. They may need to negotiate any changes to these past agreements. Deciding on team norms and practices: The Scrum team members negotiate and agree on their working agreement, including communication norms, tools, and techniques that best support their collaboration and productivity. Addressing team dynamics and interpersonal issues: The team may discuss concerns about communication, collaboration, or trust among team members. They may need to negotiate how to address these issues through team-building activities, conflict resolution, or coaching. Definition of Done (DoD): The Scrum team negotiates and agrees upon the criteria a Product Backlog item must meet to be considered “done.” Agile Negotiation Examples at the Team-Stakeholder Level The agile negotiation scenarios between Scrum teams and stakeholders are significantly less obvious, as they largely depend on organizational and cultural conditions. Moreover, they rely on the kind of product or service offered. To provide minimum structure, I differentiate between three basic scenarios, from product alignment to coordination (of everyday work) to line management. Of course, there are multiple other areas a systematic approach to cataloging scenarios needs to consider: The Product Alignment Level Aligning on product vision and Product Goals: Stakeholders and the Product Owner may need to negotiate and align on the overall product vision, goals, and strategic direction, ensuring that the Scrum Team’s work supports the organization’s objectives. Prioritizing organizational initiatives: The Scrum team and stakeholders need to negotiate the prioritization of various organizational initiatives that may impact the team’s focus and capacity. Establishing a release plan: The Product Owner, management, and stakeholders collaborate and negotiate a release plan, balancing expectations, resources, and time constraints. Balancing stakeholder interests: The Scrum team must negotiate and manage the interests of multiple stakeholders, ensuring that they address their needs and concerns while focusing on delivering value. Setting and managing expectations: The Scrum team must negotiate with management and stakeholders to establish and manage expectations around delivery timelines, scope, and quality. Quality and compliance: The Product Owner and Developers may need to negotiate with stakeholders to define quality standards, regulatory requirements, or other compliance criteria that the Scrum Team must adhere to during product development. Risk management: The Product Owner and stakeholders may need to collaborate in identifying, assessing, and mitigating risks that might impact the project, negotiating risk prioritization and response strategies. The Coordination Level Stakeholder involvement and communication: The Product Owner and Scrum Master may need to negotiate the level and frequency of stakeholder involvement in the Scrum process, ensuring that stakeholders are informed and engaged while minimizing disruptions to the Scrum team’s work. Resource allocation: The Scrum team may need to negotiate with stakeholders to secure the necessary resources, such as equipment, tools, or budget, to deliver the product effectively. Managing organizational change: The Scrum team may need to negotiate with management and stakeholders to drive and support organizational change, such as adopting agile practices like Scrum, new tools, or structural changes. Reporting and metrics: The Scrum team and stakeholders may need to negotiate the types of reports, metrics, or KPIs used to track the Scrum Team’s progress and performance, ensuring that they provide valuable insights and support decision-making without creating undue overhead. Handling escalations and critical issues: When critical issues arise, the Scrum team, management, and stakeholders must negotiate and collaborate to address the situation, balancing the need for swift action with the team’s autonomy and process. Managing dependencies: The Scrum team may need to negotiate with other Scrum teams or departments to manage dependencies, coordinate work, and ensure a smooth delivery process. The Line Management Level Team composition: Scrum Team members and the management may need to negotiate the optimal team composition, considering factors such as skill sets, experience, and team dynamics. Moreover, they need to agree on how to identify new team members. Balancing individual and Scrum team goals: Team members may need to negotiate their personal development goals and aspirations with the collective goals and needs of the Scrum Team with the management. Career development: Line managers and Scrum team members may need to negotiate career development plans, including goals, training opportunities, and potential career paths within the organization. Compensation and benefits: Line managers and Scrum team members may need to negotiate whether existing individual compensation packages, including salary, bonuses, and other benefits, align with the needs of the Scrum team to work as a cohesive unit, not as a group of individuals. Performance management: Line managers, Scrum Masters, or agile leaders may need to negotiate performance expectations, feedback mechanisms, and evaluation criteria for team members. Conclusion Trying to exercise presumed authority over teammates or stakeholders or taking a crowbar as your tool of choice to solve problems and accelerate decisions won’t get you anywhere when working in a complex environment with agile teams. Instead, you better get good at continuous agile negotiation, as solving your customers’ problems in a complex environment requires communication skills, empathy, patience, diplomacy, and professionalism. How are you negotiating with teammates, stakeholders, and the management? Please share your experience with us in the comments.
In today's fast-paced digital world, application performance has become critical in delivering a seamless user experience. Users expect applications to be lightning-fast and responsive, no matter the complexity of the task at hand. To meet these expectations, developers constantly look for ways to improve their application's performance. One solution that has gained popularity in recent years is the integration of MicroStream and Redis. By combining these two cutting-edge technologies, developers can create ultrafast applications that deliver better results. In this post, we will explore the benefits of this integration and how developers can get started with this powerful combination. MicroStream is a high-performance, in-memory persistence engine designed to improve application performance. MicroStream can store data in memory without needing a mapper or conversion process. It means that developers can work with objects directly without worrying about the mapping process, saving around 90% of the computer power that would have been consumed in the mapping process. One of the critical advantages of MicroStream is its speed. By storing data in memory, MicroStream allows faster read and write operations, resulting in improved application performance. MicroStream's data structure is optimized for in-memory storage, enhancing its speed and efficiency. It makes it an ideal solution for applications that require fast response times and high throughput. Another advantage of MicroStream is its simplicity. With MicroStream, developers can work with objects directly without dealing with the complexities of SQL databases or other traditional persistence solutions. It makes development faster and more efficient, allowing developers to focus on creating great applications instead of struggling with complex data management. MicroStream's speed, simplicity, and efficiency make it an ideal solution for modern application development. By eliminating the need for a mapper or conversion process, MicroStream saves valuable computer power and resources, resulting in significant cost savings for developers. And with its optimized data structure and in-memory storage capabilities, MicroStream delivers fast and reliable performance, making it a powerful tool for building high-performance applications. We have enough of the theory: let's move to the next session, where we can finally see both databases working together to impact my application. Database Integratrion In an upcoming article, we will explore the integration of MicroStream and Redis by creating a simple project using Jakarta EE. With the new Jakarta persistence specifications for data and NoSQL, it is now possible to combine the strengths of MicroStream and Redis in a single project. Our project will demonstrate how to use MicroStream as the persistence engine for our data and Redis as a cache for frequently accessed data. Combining these two technologies can create an ultrafast and scalable application that delivers better results. We will walk through setting up our project, configuring MicroStream and Redis, and integrating them with Jakarta EE. We will also provide tips and best practices for working with these technologies and demonstrate how they can be used to create powerful and efficient applications. Overall, this project will serve as a practical example of using MicroStream and Redis together and combining them with Jakarta EE to create high-performance applications. Whether you are a seasoned developer or just starting, this project will provide valuable insights and knowledge for working with these cutting-edge technologies. The project is a Maven project where the first step is to put the dependencies besides the CDI and MicroStream: XML <code> <dependency> <groupId>expert.os.integration</groupId> <artifactId>microstream-jakarta-data</artifactId> <version>${microstream.data.version}</version> </dependency> <dependency> <groupId>one.microstream</groupId> <artifactId>microstream-afs-redis</artifactId> <version>${microstream.version}</version> </dependency> </code> The next step is creating both entity and repository; in our scenario, we'll create a Book entity with Library as a repository collection. Java @Entity public class Book { @Id private String isbn; @Column("title") private String title; @Column("year") private int year; } @Repository public interface Library extends CrudRepository<Book, String> { List<Book> findByTitle(String title); } The final step before running is to create the Redis configuration where we'll overwrite the default StorageManager to use the Redis integration, highlighting MicroStream can integrate with several databases such as MongoDB, Hazelcast, SQL, etc. Java @Alternative @Priority(Interceptor.Priority.APPLICATION) @ApplicationScoped class RedisSupplier implements Supplier<StorageManager> { private static final String REDIS_PARAMS = "microstream.redis"; @Override @Produces @ApplicationScoped public StorageManager get() { Config config = ConfigProvider.getConfig(); String redis = config.getValue(REDIS_PARAMS, String.class); BlobStoreFileSystem fileSystem = BlobStoreFileSystem.New( RedisConnector.Caching(redis) ); return EmbeddedStorage.start(fileSystem.ensureDirectoryPath("microstream_storage")); } public void close(@Disposes StorageManager manager) { manager.close(); } } Done, we're ready to go! For this sample, we'll use a simple Java SE; however, you can do it with MicroProfile and Jakarta EE with microservices. Java try (SeContainer container = SeContainerInitializer.newInstance().initialize()) { Book book = new Book("123", "Effective Java", 2002); Book book2 = new Book("1234", "Effective Java", 2019); Book book3 = new Book("1235", "Effective Java", 2022); Library library = container.select(Library.class).get(); library.saveAll(List.of(book, book2, book3)); List<Book> books = library.findByTitle(book.getTitle()); System.out.println("The books: " + books); System.out.println("The size: " + books.size()); } Conclusion In conclusion, MicroStream integration with multiple databases is a promising approach to designing high-performance data management systems. This project explores various integration techniques to connect microstream with databases such as MySQL, MongoDB, Oracle, and PostgreSQL. The system will be designed and implemented using a combination of programming languages such as Java, Python, and JavaScript. The project will also provide documentation, training materials, and benchmark tests to ensure the system meets the specified requirements and delivers user value. By leveraging the power of MicroStream technology and integrating it with different databases, organizations can build robust, scalable, and efficient data management systems that can handle large amounts of data and complex data structures. This approach can give organizations a competitive edge by enabling them to process data faster, make better-informed decisions, and enhance operational efficiency. Overall, MicroStream integration with multiple databases is a promising approach that can benefit organizations in various industries. With the right design, implementation, and testing, organizations can leverage this approach to build data management systems that meet their unique business needs and drive success. Reference: Source code
Flask is a popular web framework for building web applications in Python. Docker is a platform that allows developers to package and deploy applications in containers. In this tutorial, we'll walk through the steps to build a Flask web application using Docker. Prerequisites Before we begin, you must have Docker installed on your machine. You can download the appropriate version for your operating system from the official Docker website. Additionally, it would help if you had a basic understanding of Flask and Python. Creating a Flask Application The first step is to create a Flask application. We'll create a simple "Hello, World!" application for this tutorial. Create a new file called app.py and add the following code: Python from flask import Flask app = Flask(__name__) @app.route('/') def hello(): return 'Hello, World!' Save the file and navigate to its directory in a terminal. Creating a Dockerfile The next step is to create a Dockerfile. A Dockerfile is a script that describes the environment in which the application will run. We'll use the official Python 3.8 image as the base image for our Docker container. FROM python:3.8-slim-buster: This sets the base image for our Docker container to the official Python 3.8 image. WORKDIR /app: This sets the working directory inside the container to /app. COPY requirements.txt .: This copies the requirements.txt file from our local machine to the /app directory inside the container. RUN pip install --no-cache-dir -r requirements.txt: This installs the dependencies listed in requirements.txt. COPY . .: This copies the entire local directory to the /app directory inside the container. CMD [ "python", "app.py" ]: This sets the command to run when the container starts to python app.py. Create a new file called Dockerfile and add the following code: Dockerfile FROM python:3.8-slim-buster # Set the working directory WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy the application code COPY . . # Run the application CMD [ "python", "app.py" ] Save the Dockerfile and navigate to its directory in a terminal. Building the Docker Image The next step is to build a Docker image from the Dockerfile. Run the following command to build the image: Python docker build -t my-flask-app . This command builds an image named my-flask-app from the Dockerfile in the current directory. The . at the end of the command specifies that the build context is the current directory. Starting the Docker Container Now that we have a Docker image, we can start a container from it. Run the following command to start a new container from the my-flask-app image and map port 5000 on the host to port 5000 in the container: Python docker run -p 5000:5000 my-flask-app This command starts a new container from the my-flask-app image and maps port 5000 on the host to port 5000 in the container. Testing the Flask Application Finally, open your web browser and navigate to http://localhost:5000. You should see the "Hello, World!" message displayed in your browser, indicating that the Flask application is running inside the docker application. Customizing the Flask Application You can customize the Flask application by modifying the app.py file and rebuilding the Docker image. For example, you could modify the hello function to return a different message: Python @app.route('/') def hello(): return 'Welcome to my Flask application!' Save the app.py file and rebuild the Docker image using the docker build command from earlier. Once the image is built, start a new container using the docker run command from earlier. When you navigate to http://localhost:5000, you should see the updated message displayed in your browser. Advantages Docker simplifies the process of building and deploying Flask applications, as it provides a consistent and reproducible environment across different machines and operating systems. Docker allows for easy management of dependencies and versions, as everything needed to run the application is contained within the Docker image. Docker facilitates scaling and deployment of the Flask application, allowing for the quick and easy creation of new containers. Disadvantages Docker adds an additional layer of complexity to the development and deployment process, which may require additional time and effort to learn and configure. Docker may not be necessary for small or simple Flask applications, as the benefits may not outweigh the additional overhead and configuration. Docker images and containers can take up significant disk space, which may concern applications with large dependencies or machines with limited storage capacity. Conclusion In this tutorial, we've walked through the steps to build a Flask web application using Docker. We've created a simple Flask application, written a Dockerfile to describe the environment in which the application will run, built a Docker image from the Dockerfile, started a Docker container from the image, and tested the Flask application inside the container. With Docker, you can easily package and deploy your Flask application in a consistent and reproducible manner, making it easier to manage and scale your application.
Setting up the CI/CD pipelines for Kubernetes is complex. Post setup, managing the CI/CD can become quite challenging when your team and infrastructure grow quickly. An efficient and fully automated CI/CD plays an important role in ensuring you ship your features faster. By leveraging CI/CD, you can guarantee that the application teams ship features faster while ensuring best practices for the code base. In this micro-blog, we will see how easily one can create and manage the CI/CD pipelines with Devtron, but before doing that, let's do a quick refresher on CI/CD. What Is CI/CD? CI stands for “Continuous integration,” which is the stage where we validate and test the changes pushed by developers. It is where you run all your test cases, i.e., unit and integration tests. After we have validated and tested the codebase, we can build the software artifacts in this step; for e.g., we can build a container image, or it can be a binary executable. CD stands for “Continuous delivery,” which is when you push your software artifacts to the production/staging environment in an automated process. In this context, after building a container image and pushing it to the container registry, we deploy the same image in the Kubernetes cluster with the help of Devtron. What Is Kubernetes Native CI/CD? The Kubernetes native CI/CD leverages the software delivery workflows by being part of the Kubernetes cluster. The benefit of using Kubernetes native CI/CD is that your build step will happen inside a pod, and you don’t need to provision or connect any other compute resources externally for your CI. From a security perspective, this method is recommended as all of our secrets & external credentials are within the cluster. Kubernetes Native CI/CD With Devtron When you're managing your pipelines with YAML, you need to write the workflow and update it repeatedly. Understanding YAML, examining the build process, learning different Linux commands, deploying code manually, and learning Kubernetes is hard for a developer. They need something intuitive, simple and doesn’t require domain expertise. Even if you don’t know much about Kubernetes, you should be able to develop, execute & trigger your builds & deployments. This is what we are trying to solve using Devtron. It gives you an intuitive dashboard to help you quickly set up and manage your CI/CD pipelines on Kubernetes. Let's install Devtron and set up some CI/CD pipelines for our applications. For installing Devtron, refer to the installation documentation. After installing Devtron, we will port-forward the Devtron service to get the dashboard URL since we are installing it on a local machine. kubectl -n devtroncd port-forward service/devtron-service 8000:80 & After this, you should be able to access the Devtron dashboard on localhost port 8080. For demonstrating the CI/CD pipeline, let's use a nodejs demo application provided by Devtron on GitHub repository. After getting this, you need to set the global configurations for your application. For setting global configuration refer to the documentation. Step 1: Enabling GitOps To enable GitOps, go to global configuration and then use any Git provider for authentication. In our case, we have used GitHub. It is recommended to create a new GitHub organization of all your GitOps configs. Step 2: Creating an Application Now let’s move ahead and create our first application on Devtron. Provide the App Name i.e demo-application, assign it a project i.e. devtron-demo and then click on Create App. Step 3: Adding GitHub Repository and Build Configuration The next step is to add the Git Repo URL of your source code from where it is being hosted. If you want to deploy an application that is in a private repo, explore the Git Accounts Docs. Now once we have set up our repository, we need to set the build configuration. The build configuration is required to pull and push your container images. You can change the registry to use docker, ECR, GCR, etc, which can be configured from container registries under Global Configurations. We have now integrated Buildpacks with which you can build OCI-compliant container images directly from your code. If you want to build a platform-specific image, then you can configure that in the advanced options in Build Configurations. Step 4: Configuring Base Deployment Template Base Deployment Template is where all magic happens. Devtron displays a fined-grained configuration, abstracting out all the Kubernetes config complexities and packages the template in a helm chart. The deployment template consists of almost all major configurations that would be required for a production-grade microservice, and it comes with two display options — Basic (GUI Mode) & Advanced (YAML Configs). You can configure ContainerPort, set Requests & Limits for your deployment, provide Environment Variables, etc. There’s also an advanced option from which you can get the detailed YAML configurations that Devtron will use during deployment and overwrite the same. Here’s a glimpse of how the deployment template looks. You can configure Horizontal Pod Autoscaling, Ingress, Security Context, Keda Autoscaling, Volume Mountsand many more Kubernetes configurations just with a few tweaks in the predefined template. With Devtron’s deployment template, you don’t have to worry about writings tons of YAML file for your Kubernetes configurations. Step 5: Creating and Triggering a CI/CD Pipeline Once you have configured the deployment template, you are ready to build your CI/CD Workflow. Let us go ahead and configure continuous integration. To know more about different ways of creating pipelines, feel free to read workflow editor documentation. In this example, we will build & deploy from Devtron Dashb Now, let's create our CI Build Pipeline and configure the pipeline details. In CI pipeline, you can configure vulnerabilities scanning, pre/ post build stages, give the source type i.e, Branch Fixed, Regex, Pull Request or Tag Creation for building images, and a lot more. Now after the creating build pipeline, let’s create the Deployment Pipeline. Use the + icon on the same page to add your deployment pipeline. With Devtron, you can create as many pipelines as you want, be it sequential pipelines or deploy to multiple environments from the same build. In Deployment Pipeline you can configure pre/ post deployment stages, out-of-the-box integrations of deployment strategies i.e., Rolling, Recreate, Blue-Green, and Canary. It also allows you to choose different deployment strategies for different environments in the same application. The default deployment strategy is Rolling as you can see in the below image. In this blog, we will proceed with Blue-Green deployment strategy. You can check out the deployment strategy configurations and make changes depending on your use case. This configuration change is an application for all other deployment strategies. After configuring the CI & CD pipelines, here’s what the entire pipeline looks like. We can create multiple workflows with different configs for different environments and build types in the same application. Isn't it interesting!! Now that pipelines are created, we need to trigger our pipelines and for that we have to go to the Build & Deploy section. To build an image, click on Select Material -> Select the Commit for which you want to build -> Click on Start Build. Once the build is initiated, we can check the logs of the build process and when the build is successful then we can verify the image built in the artifacts section. It also allows you to check the previous builds, and all necessary audit info required in the dashboard as you can see in the image below. Let’s trigger the deployment pipeline. After the build pipeline is successful, there will be an option to select image in Build & Deploy page. Select the image and then click on Deploy. We can also check the commit and security details against which deployment is being initiated. Once deployment is started, Devtron gives you details of all stages along with K8s Configurations against each deployment. In CD pipelines as well, we get all the audit info regarding the deployment as you can see in the image below. Step 6: App Details Once your application is deployed, you can view detailed information about your application in the App Details. All the health-related information about your workloads and application metrics can be easily monitored from App Details tab. After the application is successfully deployed, it gives you a 360-degree view of all info regarding the application and allows you to perform different pod operations such as view logs, exec into terminal, check manifests, events and also comes with Logs Analyzer that allows grep on multi-container logs. It also shows all the Kubernetes resources associated with particular applications in a grouped manner that comes very handy for debugging and observability. Conclusion In this blog, we have learned about Kubernetes CI/CD and how easily it can be set up using Devtron in just 3 minutes with all the best practices and GitOps. One of the main benefits of a quick Kubernetes CI/CD setup is that it can speed up the whole release process by automating your build and continuously testing the codebase. Developers don’t need to build and test manually. A fast, reliable, and efficient Kubernetes CI/CD will lead to better products and happier users.
Experimentation: How Data Leaders Can Generate Crystal Clear ROI
May 11, 2023 by
May 11, 2023 by
Empty Character Class in JavaScript Regexes
May 11, 2023 by
Thank You CODEOWNERS, It's Time for Code Experts
May 11, 2023 by
Experimentation: How Data Leaders Can Generate Crystal Clear ROI
May 11, 2023 by
Thank You CODEOWNERS, It's Time for Code Experts
May 11, 2023 by
May 11, 2023 by
Empty Character Class in JavaScript Regexes
May 11, 2023 by
Harnessing the Power of MQTT for the Future of IoT
May 11, 2023 by