Apache flink is a distributed query engineer that can process batch as well as streaming data. In this session, we will learn flink wordcount scala.
We will be using IntelliJ ID to write and export jar and maven to set up the flink dependency. So let’s get started.
Setup flink development environment
Before Starting to write flink code, make sure to install/configure the following tool/software in your system.
Setup IntelliJ id for flink
You can download the IntelliJ IDEA based on your operating system from here.
Setup Maven for flink
Since we will be using Maven to set up the flink dependency for your word count project, please make sure the maven is set up on your machine.
In macOS, you can type the below command to setup maven
brew install maven
Setup flink locally
Follow this link to install flink in your system. Once flink is installed the WebUI can be accessed on localhost:8081.
Flink Project Template for scala
Apache maven provides the Quickstart Archetype using which you can easily generate the project template for your flink job. Type the below command to generate the Quickstart flink project
mvn archetype:generate \ -DarchetypeGroupId=org.apache.flink \ -DarchetypeArtifactId=flink-quickstart-scala \ -DarchetypeVersion=1.14.4
Provide the groupId, artifactId, and package name.
Define value for property 'groupId': org.apache.flink Define value for property 'artifactId': flink-scala-wc Define value for property 'version' 1.0-SNAPSHOT: : Define value for property 'package' org.apache.flink: : Confirm properties configuration: groupId: org.apache.flink artifactId: flink-scala-wc version: 1.0-SNAPSHOT package: org.apache.flink Y: : y
After passing the above details a flink-scala-wc maven project will be created. Now open this project in IntelliJ ID.
Flink wordcount example scala
In this session, we will learn how to write a word-count application in scala.
Open the existing flink-scala-wc application which is generated using the mvn archetype. Delete existing scala application and crate on new scala class
Provide the class name as wordCount and select the object and click on the ok button.
Paste the below code in the wordCount File
package org.apache.flink import org.apache.flink.api.scala._ import org.apache.flink.api.scala.ExecutionEnvironment import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.api.common.functions.FlatMapFunction object wordCount { def main(args: Array[String]) { val params: ParameterTool = ParameterTool.fromArgs(args) // set up the streaming execution environment val env = ExecutionEnvironment.getExecutionEnvironment // make parameters available in the web interface env.getConfig.setGlobalJobParameters(params) val text = env.readTextFile(params.get("input")) val counts = text .flatMap(value => value.split("\\s+")) .map(value => (value,1)) .groupBy(0) .sum(1) counts.writeAsCsv(params.get("output"), "\n", " ") env.execute("Scala WordCount Example") } }
Change the main class name in pom.xml as per your class name
<mainClass>org.apache.flink.wordCount</mainClass>
Now perform maven clean and maven compile.
Make sure there is no error while compilation.
Flink wordcount jar
In this session, we will learn how to generate the jar file for the wordcount job which is required to run the flink application.
There are many ways in which a user can generate a jar file. one of the easy ways is by using the IntelliJ id itself. Before generating the jar using IntelliJ id make sure that the project is compiled successfully without any errors.
To generate the jar click on the maven package button.
Once the package command runs successfully jar will be generated under the target folder with the name
flink-scala-wc-1.0-SNAPSHOT.jar
Run flink wordcount scala
Now will be using the above jar file to submit the flink job. The above wordcount job takes 2 parameters
input
output
input= Files where to read the data from
output= path where to write the o/p in CSV format.
Now type the below command to submit the flink job.
./bin/flink run --detached ~/flink-scala-wc-1.0-SNAPSHOT.jar --input ~/flink/flink-1.14.4/README.txt --output ~/flink_out.txt
Conclusion
I hope you have liked this small tutorial on flink wordcount. In this blog, we have learned how to create a flink wordcount application schema and we have also learned how to generate the jar file and submit the jar to the flink job.
Please do let me know if you are facing any issues while following along.
More to Explore
How to send email from airflow