Name: Word Count
Contact Person: support-compss@bsc.es
Access Level: public
License Agreement: Apache2
Platform: COMPSs
Description
Wordcount is an application that counts the number of words for a given set of files.
To allow parallelism the file is divided in blocks that are treated separately and merged afterwards.
Results are printed to a Pickle binary file, so they can be checked using: python -mpickle result.txt
This example also shows how to manually add input or output datasets to the workflow provenance recording (using the 'input' and 'output' terms in the ro-crate-info.yaml file).
Execution instructions
Usage:
runcompss --lang=python $(pwd)/application_sources/src/wordcount_blocks.py filePath resultPath blockSize
where:
- filePath: Absolute path of the file to parse
- resultPath: Absolute path to the result file
- blockSize: Size of each block. The lower the number, the more tasks will be generated in the workflow
Execution Examples
runcompss --lang=python $(pwd)/application_sources/src/wordcount_blocks.py $(pwd)/dataset/data/compss.txt result.txt 300
runcompss $(pwd)/application_sources/src/wordcount_blocks.py $(pwd)/dataset/data/compss.txt result.txt 300
python -m pycompss $(pwd)/application_sources/src/wordcount.py $(pwd)/dataset/data/compss.txt result.txt 300
Build
No build is required
Click and drag the diagram to pan, double click or use the controls to zoom.
Version History
COMPSs 3.3 (earliest) Created 15th Dec 2023 at 14:57 by Raül Sirvent
Run using COMPSs 3.3 version at Marenostrum IV supercomputing, using 1 node (48 cores).
Frozen
COMPSs-3.3
4d9de37
Creator
Additional credit
The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter
Views: 2029 Downloads: 336
Created: 15th Dec 2023 at 14:57
None