白日依山尽,黄河入海流。欲穷千里目,更上一层楼。 -- 唐·王之涣

sqoop export导入任务失败但部分数据入库成功能问题

背景

目前借助于 Azkaban 进行大数据相关任务调度。任务执行失败会收到响应的告警。本次失败登录 Azkaban 错误日志分析,核心错误 Job failed as tasks failed. failedMaps:1 failedReduces:0

但是Azkaban 页面错误信息显示不是非常详细。没有看到失败的具体原因是什么。同步收到开发反馈,对应的业务数据相比之前少了一半。

任务执行失败,开发反馈有数据,但是数据相比之前少了一半。

问题分析

1、找到对应azkaban任务脚本,看到是利用 sqoop export 从hive从导出数据到mysql数据库表

2、查看 mysql 库表记录,只有100条,但是正常情况应该是200条记录

3、查看hive中对应的表,显示结果是200条。所以源头数据应该是没有问题的

4、排查相关错误日志未发现有用信息。因为200条记录,导入了100条,那么是不是第101条数据有问题呢?

5、在hive执行分页查询 select * from xxx where dt='2022-09-15' limit 100, 1; 刚好看到第101条记录的部分字段合并在了一起,因为是0和Null合并,很明显

6、这个时候怀疑是数据源有问题,两个字段显示在一起 导致实际hive表中的字段个数和mysql表中的字段个数不一致导致的

7、又是各种找日志分析,未发现到有用信息,在另外一个主机上用普通用户登录hive查询的时候虽然有告警信息,但是第101行记录的显示确实正常的,没有出现字段显示在一起 的问题,那么就能真正的排除数据源的问题

8、这个时候暂停转而去了解 sqoop 的用法,看是否哪里用的不对,是在不同的用户下sqoop查询字段获取的问题,做实验也未验证到结果

9、然后再了解sqoop的过程中,get到sqoop export也是把任务转化成Hadoop的mapreduct job去执行的。其实在azkaban的错误日志中就有显示 INFO mapreduce.Job: Job job_1659629210017_4904 failed with state FAILED due to: Task failed task_1659629210017_4904_m_000000 只是当初没有注意到罢了,部分详细错误日志详见文末

10、了解大数据的都知道,mapreduct job 最终的调度和任务执行都是通过Yarn来进行的,在Hadoop job页面可以找到 报错的JOBID,进而可以找到该任务是调度到那个 NodeManager 去执行的,

11、登录执行的NodeManager 通过如下命令去获取详细的执行日志

1
yarn logs -applicationId application_1659629210017_4904

看到有错误信息

1
2022-10-08 16:34:58,417 ERROR [Thread-11] org.apache.sqoop.mapreduce.AsyncSqlOutputFormat: Got exception in update thread: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'receiveDiscordIds' at row 1

目前数据库中该字段是text类型,长度不够,那就修改为 mediumtext类型 ,清理掉mysql中异常数据,重新跑任务成功,数据200条

总结

1、azkaban 任务执行失败,首先看azkaban 提供的错误信息,一般在脚本中会在关键位置输出日志。有助于排查问题

2、如果azkaban 没有找到问题根源,大数据任务最终都是通过Yarn调用和执行的,可以通过Hadoop Job 管理页面找到调度的节点

3、在Yarn节点上执行 yarn logs -applicationId application_id 可以找到更加详细的执行过程中的日志

扩展 Yarn 架构

img

Yarn 四大组件的介绍

ResourceManager:

1、处理客户端请求
2、启动/监控ApplicationMaster
3、监控NodeManager
4、资源分配与调度

APPlicationMaster:

1、程序切分
2、为应用程序申请资源,并分配任务
3、任务监控与容错

NodeManager:

1、单个节点上资源管理
2、处理来自ResourceManager的命令
3、处理来自ApplicationMaster的命令

Container:

对任务运行环境的抽象,封装了CPU、内存等多维资源以及环境变量、启动命令等任务运行相关信息’

附加

部分失败任务的日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
15-09-2022 09:22:31 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:31 INFO mapreduce.Job:  map 0% reduce 0%
15-09-2022 09:22:43 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:43 INFO mapreduce.Job: map 100% reduce 0%
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 INFO mapreduce.Job: Job job_1659629210017_4904 failed with state FAILED due to: Task failed task_1659629210017_4904_m_000000
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Job failed as tasks failed. failedMaps:1 failedReduces:0
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO -
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 INFO mapreduce.Job: Counters: 8
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Job Counters
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Failed map tasks=1
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Launched map tasks=1
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Rack-local map tasks=1
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Total time spent by all maps in occupied slots (ms)=469680
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Total time spent by all reduces in occupied slots (ms)=0
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Total time spent by all map tasks (ms)=9785
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Total vcore-milliseconds taken by all map tasks=9785
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Total megabyte-milliseconds taken by all map tasks=480952320
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 46.0112 seconds (0 bytes/sec)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 INFO mapreduce.ExportJobBase: Exported 0 records.
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 ERROR mapreduce.ExportJobBase: Export job failed!
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - 22/09/15 09:22:44 ERROR tool.ExportTool: Error during export:
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - Export job failed!
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:445)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
15-09-2022 09:22:44 CST export_dmr_deceitful_delivery INFO - at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

参考地址:

https://blog.csdn.net/NDF923/article/details/122067125
https://blog.csdn.net/qq_43842093/article/details/122401599

作者

Colin

发布于

2022-10-08

许可协议