- 浏览: 1129597 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (411)
- Java Foundation (41)
- AI/机器学习/数据挖掘/模式识别/自然语言处理/信息检索 (2)
- 云计算/NoSQL/数据分析 (11)
- Linux (13)
- Open Source (12)
- J2EE (52)
- Data Structures (4)
- other (10)
- Dev Error (41)
- Ajax/JS/JSP/HTML5 (47)
- Oracle (68)
- FLEX (19)
- Tools (19)
- 设计模式 (4)
- Database (12)
- SQL Server (9)
- 例子程序 (4)
- mysql (2)
- Web Services (4)
- 面试 (8)
- 嵌入式/移动开发 (18)
- 软件工程/UML (15)
- C/C++ (7)
- 架构Architecture/分布式Distributed (1)
最新评论
-
a535114641:
LZ你好, 用了这个方法后子页面里的JS方法就全不能用了呀
页面局部刷新的两种方式:form+iframe 和 ajax -
di1984HIT:
学习了,真不错,做个记号啊
Machine Learning -
赵师傅临死前:
我一台老机器,myeclipse9 + FB3.5 可以正常使 ...
myeclipse 10 安装 flash builder 4.6 -
Wu_Jiang:
触发时间在将来的某个时间 但是第一次触发的时间超出了失效时间, ...
Based on configured schedule, the given trigger will never fire. -
cylove007:
找了好久,顶你
Editable Select 可编辑select
Spring Batch: 大数据量批量并行处理框架
- 博客分类:
- J2EE
- 云计算/NoSQL/数据分析
Spring Batch Documentation:
http://static.springsource.org/spring-batch/reference/index.html
Use Cases for Spring Batch:
http://static.springsource.org/spring-batch/cases/index.html
Spring Batch Tutorial:
http://www.mkyong.com/tutorials/spring-batch-tutorial/comment-page-1/#comment-138186
spring batch所能做的,hadoop都能做。但是spring batch写了一个java batch job framework,它的作用就是帮你管理好你的job,各种监控,流程控制,重启等,也可以说是一个标准,免得你自己写一个framework的时候,漏掉很多细节。不过你要把spring batch的工作都放到hadoop里面做,可能hadoop有点大柴小用了。pring batch 对于处理批量任务还是挺棒的,hadoop更加有利于数据挖掘之类。spring batch适合规模不算太大的数据处理,hadoop那肯定是上规模的计算与处理了。
Java EE 7 的 batch 框架基本是和 spring batch 一致的。两者比较见:
https://blog.codecentric.de/en/2013/07/spring-batch-and-jsr-352-batch-applications-for-the-java-platform-differences/
关于 Step:
http://docs.spring.io/spring-batch/reference/html/configureStep.html
http://www.mkyong.com/spring-batch/spring-batch-hello-world-example/
http://java.dzone.com/articles/chunk-oriented-processing
引用
Spring batch 提供两种 step:
1. Chunk-Oriented task,或称为 READ-PROCESS-WRITE task
2. TaskletStep-Oriented task,或称为 single operation task (即 Tasklet 接口)。The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. Each call to the Tasklet is wrapped in a transaction(即:一次对 TaskletStep 调用时的所有 DB 操作,都是在一个事务中的,所以你不用担心 TaskletStep 调用过程中的 failure 对数据的影响). Tasklet implementors might call a stored procedure, a script, or a simple SQL update statement. To create a TaskletStep, the 'ref' attribute of the <tasklet/> element should reference a bean defining a Tasklet object; no <chunk/> element should be used within the <tasklet/>。
In Spring Batch, A job consists of many steps and each step consists of a READ-PROCESS-WRITE task or single operation task (tasklet).
1 Job = Many Steps.
1 Step = 1 READ-PROCESS-WRITE or 1 Tasklet.(严格一个,或者是 Chunk Oriented task,或者是 TaskletStep Oriented task)
Job = {Step 1 -> Step 2 -> Step 3} (Chained together)。
基本上,对于符合 IN-PROCESS-OUT 模型的 step,使用 Chunk Oriented task;对只需要 IN 或 OUT 两者之一,或者都不需要的(如只是做清理资源、truncate db table 等操作),使用 TaskletStep Oriented task。
1. Chunk-Oriented task,或称为 READ-PROCESS-WRITE task
2. TaskletStep-Oriented task,或称为 single operation task (即 Tasklet 接口)。The Tasklet is a simple interface that has one method, execute, which will be a called repeatedly by the TaskletStep until it either returns RepeatStatus.FINISHED or throws an exception to signal a failure. Each call to the Tasklet is wrapped in a transaction(即:一次对 TaskletStep 调用时的所有 DB 操作,都是在一个事务中的,所以你不用担心 TaskletStep 调用过程中的 failure 对数据的影响). Tasklet implementors might call a stored procedure, a script, or a simple SQL update statement. To create a TaskletStep, the 'ref' attribute of the <tasklet/> element should reference a bean defining a Tasklet object; no <chunk/> element should be used within the <tasklet/>。
In Spring Batch, A job consists of many steps and each step consists of a READ-PROCESS-WRITE task or single operation task (tasklet).
1 Job = Many Steps.
1 Step = 1 READ-PROCESS-WRITE or 1 Tasklet.(严格一个,或者是 Chunk Oriented task,或者是 TaskletStep Oriented task)
Job = {Step 1 -> Step 2 -> Step 3} (Chained together)。
基本上,对于符合 IN-PROCESS-OUT 模型的 step,使用 Chunk Oriented task;对只需要 IN 或 OUT 两者之一,或者都不需要的(如只是做清理资源、truncate db table 等操作),使用 TaskletStep Oriented task。
Spring batch 与 spring integration 的混合使用:
http://blog.springsource.org/2010/02/15/practical-use-of-spring-batch-and-spring-integration/
http://static.springsource.org/spring-batch-admin/trunk/spring-batch-integration/
存疑:
1. Passing data between steps?一个单线程下可行的方案:
http://wangxiangblog.blogspot.com/2013/02/spring-batch-pass-data-across-steps.html
2. processor 处理 item 的过程,可以是批量式的吗?同理 read item 可以是批量读吗?相关:
http://forum.spring.io/forum/spring-projects/batch/63873-itemreader-returning-one-list
3. read 一个 item,但经过 processor 处理后,变为多个 items(即:processor 的 input 为一个 object,但 output 为一个 list),怎么交给 writer?相关:
http://forum.spring.io/forum/spring-projects/batch/111650-itemprocessor-receiving-one-item-returning-more-than-one
Spring Batch ref:
A Job has one to many steps, which has exactly one ItemReader, ItemProcessor, and ItemWriter. A job needs to be launched (JobLauncher), and meta data about the currently running process needs to be stored (JobRepository):
Batch Stereotypes(Chapter 3. The Domain Language of Batch)
A JobLauncher uses the JobRepository to create new JobExecution objects and run them. Job and Step implementations later use the same JobRepository for basic updates of the same executions during the running of a Job. The basic operations suffice for simple scenarios, but in a large batch environment with hundreds of batch jobs and complex scheduling requirements, more advanced access of the meta data is required (4.5. Advanced Meta-Data Usage)
2.4. Meta Data Access Improvements
3.1. Job
Spring Batch uses a 'Chunk Oriented' processing style within its most common implementation. Chunk oriented processing refers to reading the data one at a time, and creating 'chunks' that will be written out, within a transaction boundary 一次面向块的read/write在一个事务中!. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.
5.1. Chunk-Oriented Processing
APIs:
引用
Job - A Job is an entity that encapsulates an entire batch process.
JobInstance - A JobInstance refers to the concept of a logical job run. 可以这样想:JobInstance = Job + JobParameters.
JobExecution - A JobExecution refers to the technical concept of a single attempt to run a Job. An execution may end in failure or success, but the JobInstance corresponding to a given execution will not be considered complete unless the execution completes successfully.
JobParameters - JobParameters is a set of parameters used to start a batch job. "how is one JobInstance distinguished from another?" The answer is: JobParameters.
Job conclusion - A Job defines what a job is and how it is to be executed, and JobInstance is a purely organizational object to group executions together, primarily to enable correct restart semantics. A JobExecution, however, is the primary storage mechanism for what actually happened during a run
Step - A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing. As with Job, a Step has an individual StepExecution that corresponds with a unique JobExecution.
StepExecution - A StepExecution represents a single attempt to execute a Step. A new StepExecution will be created each time a Step is run, similar to JobExecution. However, if a step fails to execute because the step before it fails, there will be no execution persisted for it. A StepExecution will only be created when its Step is actually started.
Tasklet -
Chunk -
ExecutionContext - An ExecutionContext is a collection of key/value pairs that are persisted by the framework and provide a place to store persistent data that is scoped to a StepExecution or JobExecution. This storage is useful for example in stateful ItemReaders where the current row being read from needs to be recorded.
JobListener -
JobRepository - JobRepository is the persistence mechanism for all of the Stereotypes such as JobInstance/JobParameters/JobExecution/StepExecution/ExecutionContext and so on. It provides CRUD operations for JobLauncher, Job, and Step implementations. When a Job is first launched, a JobExecution is obtained from the repository, and during the course of execution StepExecution and JobExecution implementations are persisted by passing them to the repository.
JobLauncher - JobLauncher represents a simple interface for launching a Job with a given set of JobParameters. It is expected that implementations will obtain a valid JobExecution from the JobRepository and execute the Job.
JobExplorer - provide the function that query the repository for existing executions. 你可以将其认为是 a read-only version of the JobRepository。
JobRegistry - A JobRegistry (and its parent interface JobLocator) is not mandatory, but it can be useful if you want to keep track of which jobs are available in the context. It is also useful for collecting jobs centrally in an application context when they have been created elsewhere (e.g. in child contexts). Custom JobRegistry implementations can also be used to manipulate the names and other properties of the jobs that are registered.
JobOperator - the JobRepository provides CRUD operations on the meta-data, and the JobExplorer provides read-only operations on the meta-data. However, those operations are most useful when used together to perform common monitoring tasks such as stopping, restarting, or summarizing a Job, as is commonly done by batch operators. Spring Batch provides for these types of operations via the JobOperator interface.
ItemReader - ItemReader is an abstraction that represents the retrieval of input for a Step, one item at a time. When the ItemReader has exhausted the items it can provide, it will indicate this by returning null. The basic contract of the ItemReader is that it is forward only.
ItemProcessor - ItemProcessor is an abstraction that represents the business processing of an item. While the ItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing.
ItemWriter - ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation.
Introducing Spring Batch series (three parts):
http://keyholesoftware.com/2012/06/22/introducing-spring-batch/
Batch processing in Java with Spring batch (four parts):
http://java-success.blogspot.com/2012/06/batch-processing-in-java-with-spring.html
Srcs:
中文 PPT 大致介绍:
http://www.slideshare.net/chijq/spring-batch
Spring Batch – Imperfect Yet Worthwhile:
http://www.summa-tech.com/blog/2012/01/23/spring-batch-imperfect-yet-worthwhile/
http://www.davenkin.me/post/2012-10-17/40039048526
Looking for some good examples?
Spring Batch - Hello World:
http://java.dzone.com/news/spring-batch-hello-world-1
引用
A batch Job is composed of one or more Steps. A JobInstance represents a given Job, parametrized with a set of typed properties called JobParameters. Each run of of a JobInstance is a JobExecution. Imagine a job reading entries from a data base and generating an xml representation of it and then doing some clean-up. We have a Job composed of 2 steps: reading/writing and clean-up. If we parametrize this job by the date of the generated data then our Friday the 13th job is a JobInstance. Each time we run this instance (if a failure occurs for instance) is a JobExecution. This model gives a great flexibility regarding how jobs are launched and run. This naturally brings us to launching jobs with their job parameters, which is the responsibility of JobLauncher. Finally, various objects in the framework require a JobRepository to store runtime information related to the batch execution. In fact, Spring Batch domain model is much more elaborate but this will suffice for our purpose.
What happends if a process throws an exception ?
http://alain-cieslik.com/2011/06/06/springbatch-what-append-if-a-process-throws-an-exception/
http://forum.springsource.org/showthread.php?61042-Spring-Batch-beginners-tutorial
http://stackoverflow.com/questions/1609793/how-can-i-get-started-with-spring-batch
发表评论
-
Lucene & Solr
2013-05-07 17:30 2370Params of solr query (参见 solrj ... -
Continuous Integration Server:Jenkins & Hudson
2013-04-15 16:15 1421Jenkins: http://jenkins-ci.org/ ... -
Database Table Partitioning
2013-04-09 11:58 1118http://en.wikipedia.org/wiki ... -
Scale-up(纵向扩展) vs Scale-out(横向扩展)
2013-04-08 14:56 3560Scale-up / Scale vertically / 纵 ... -
Spring Integration
2013-03-26 16:52 3004Spring Integration Reference ... -
NOSQL 之 Document Database 之 MongoDB
2013-03-21 18:11 1252Official siste: http://www.mong ... -
高可用与负载均衡:Haproxy(or lVS) + keepalived
2013-01-29 20:35 3108sources: Setting Up A High ... -
Hadoop 异常 总结
2013-01-08 10:35 1145Directory /tmp/hadoop-lee/ ... -
AOP: Aspect Oriented Programming
2013-01-06 11:13 2750The AspectJ Programming Gu ... -
NOSQL 之 Graph Database 之 neo4j
2012-12-28 11:31 1404The Neo4j Manual: http://doc ... -
Performance & Load test tool : JMeter
2012-12-18 14:28 1248Official: http://jmeter.apa ... -
rabbitmq & spring amqp
2012-12-04 00:09 8656My main AMQP post on blogger ... -
javaMail 邮件
2012-11-23 20:14 3433SMTP POP3的区别到底是什么? http://w ... -
未完 Spring MVC
2012-11-15 22:41 2067Annotations for Http pa ... -
Redis: REmote DIctionary Server
2012-11-07 14:01 2217Redis is a key-value store NoSQ ... -
JUnit 单元测试
2012-10-30 12:27 2523测试的分类: http://s ... -
Hadoop
2012-09-25 19:45 1191《Hadoop: The Definitive Guide》r ... -
海量数据 & Hadoop 面试题
2012-08-27 07:19 2529教你如何迅速秒杀99%的海量数据处理面试题 http://ww ... -
NOSQL
2012-08-23 15:45 1255NOSQL(Not Only SQL,不限于SQL)是一 ... -
云之前沿
2012-08-22 22:53 1808iaas, paas, saas: Google后Had ...
相关推荐
java练习题
云南省移动应用大赛模板.zip
前台技术框架采用Bootstrap,一个高度灵活的HTML5响应式框架,为用户提供了流畅的前端交互体验。程序开发环境支持多样化,无论是myEclipse、Eclipse还是Idea都能轻松应对,结合mysql数据库,确保了数据的高效处理与存储。后台架构则选用SSM组合——SpringMVC、Spring和Mybatis,这一组合以其稳定性和高效性而备受青睐。 校园公益信息关联系统采用b/s架构,实现用户信息、活动类型、公益活动、活动报名、捐款、捐款统计、留言和新闻信息的全面管理。系统分为前台学生端和后台管理员端,满足不同用户群体的需求。 管理员端功能丰富,包括学院管理、活动类型管理、公益活动管理、活动报名管理、捐款信息管理、管理员账号管理、密码修改、捐款统计管理、留言管理和新闻信息管理等。管理员能够灵活添加、修改、删除和查询各类信息,确保信息的准确性和时效性。同时,捐款统计功能以直观的统计图形式展现,为管理员提供决策支持。 学生端则专注于学生的日常需求,包括添加捐款信息、留言、报名活动以及密码修改等。学生可以轻松完成捐款操作,发表留言,查看并报名公益活动,随时修改个人密码,确保账
JavaWeb程序设计SSM框架选课系统开发大作业有数据库文
行业分析报告
1、嵌入式物联网ESP32项目实战开发。例程经过精心编写,简单好用。 2、代码使用Visual Studio Code + ESP-IDF开发,C语言编程。例程在ESP32-S3上运行。若在其他型号上运行,请自行调整。 3、如果接入其他传感器,请查看发布的其他资料。 4、ESP32与模块的接线,在代码当中均有定义,请自行对照。 5、若硬件差异,请根据自身情况适当调整代码,程序仅供参考。 6、代码有注释说明,请耐心阅读。 7、技术v:349014857;
USB无线网卡驱动 USB\VID_1A86&PID_E397&REV_0738
TA-Lib(Technical Analysis Library, 即技术分析库)是Python金融量化的高级库,涵盖了150多种股票、期货交易软件中常用的技术分析指标,如MACD、RSI、KDJ、动量指标、布林带等。但很多人安装指标计算ta-lib库就总报错,就可以在这里找到包下载后安装。 文件举例:TA_Lib‑0.4.24‑cp37‑cp37m‑win_amd64.whl 命名解释:包名-版本号-cp37代表适用于python3.7版本-win代表windows平台-amd64表示64位版本(与python版本要一致) 假定文件下载到d盘根目录,使用如下命令进行安装: pip install d:\TA_Lib‑0.4.24‑cp37‑cp37m‑win_amd64.whl
电子通信设计资料电动智能小车设计论文资料提取方式是百度网盘分享地址
调节篮球比赛定时器,毕业设计实验报告,multisim仿真,AD09原理图及PCB图
编程题实训-串
汉诺塔c语言递归
行业分析报告
电子通信设计资料单片机串行通信发射机论文资料提取方式是百度网盘分享地址
完整代码!扫雷游戏,vs2010使用vs2010开发小游戏,这是一个扫雷的游戏,适应于大作业和毕业论文.zip
基于JAVA毕业设计-JAVA图书馆书库管理系统设计(论文+源代码).rar 毕业设计(论文)是考核应考者综合运用所学基础理论和专业技能,独立分析和解决实际问题的能力。计算机应用专业培养从事计算机软件和硬件设计,开发和应用的高层次人才,检测考生是否阅读了必要的中外文献,能否运用科技合理的定性和定量分析,来设计和实现设计系统。 图书馆书库管理系统主要是完成图书管理员对图书的管理(增加新书,删除旧书,并修改等的图书编辑);图书管理员对读者借还书的统计(图书的在库数目和还日期的统计)和管理;读者和管理员对图书信息和读者信息的查询;当查到所需信息时,打印出相应的信息报表等工作。 在图书馆书库管理系统的设计与实现过程中,我深深体会到此次毕业设计的重要性------它是我走上工作岗位前的一次重要的练习,更深刻体会到理论联系实践的重要性和必要性。同时,我也感受到JAVA 和SQL SERVER 2000 的功能之强大,事件处理的灵活性和高效性。但我掌握和应用的还不是很熟练,应多加实践和练习,在以后的工作中,我将不断的学习和充实自己,力争成为一个高水平的程序员。
行业分析报告
mybatis-plus-extension.jar 各个版本,免费下载。 mybatis-plus 的扩展插件。,各个版本,免费下载。 mybatis 增强工具包的扩展插件,各个版本,免费下载。 下载不了,可关注我,评论区联系我。
halcon缺陷检测
1000+套最新计算机专业毕业设计源码+论文+PPT.txt.zip