Spark 操作ES报错 Failed to find data source


1. 异常描述

将服务部署到一个新的服务器上抛出了一个关于 ES的 ClassNotFoundException 异常

java.lang.ClassNotFoundException: Failed to find data source: es. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:639)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
    at com.surfilter.task.archival.service.LoadData.loadAbnormalUrlData(LoadData.scala:21)
    at com.surfilter.task.archival.ArchivalServlet$.main(ArchivalServlet.scala:16)
    at com.surfilter.task.archival.ArchivalServlet.main(ArchivalServlet.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: es.DefaultSource
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:622)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:622)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:622)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:622)
    at scala.util.Try.orElse(Try.scala:84)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:622)
    ... 15 more

2. 解决

原因是 spark 上面有es的相关jar包

将 代码中依赖的 elasticsearch-spark-xx_x.xx-x.xx.x.jar 放到每个spark 节点的 jars 下即可, 这里测试没有重启服务即可生效

我这里用的是 elasticsearch-spark-20_2.11-7.16.3.jar


文章作者: hnbian
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 hnbian !
评论
 上一篇
Flink  Log4j jar包冲突问题解决 Flink Log4j jar包冲突问题解决
flink 任务添加了 elasticsearch 之后 log4j 日志的 jar出现了冲突, 下面是一些报错日志以及解决方式: 1. 提交任务时未成功提交的日志Setting HBASE_CONF_DIR=/etc/hbase/con
2023-09-01
下一篇 
华为云编译好的 arm 架构Ambari 下载地址 华为云编译好的 arm 架构Ambari 下载地址
华为鲲鹏 aarch64 版本 Ambari HDP 下载地址 https://mirrors.huaweicloud.com/kunpeng/yum/el/7/bigdata/ ambarihttps://mirrors.huaweicl
2023-07-18
  目录