Name: Word Count
Contact Person: support-compss@bsc.es
Access Level: public
License Agreement: Apache2
Platform: COMPSs
Description
Wordcount is an application that counts the number of words for a given set of files.
To allow parallelism the file is divided in blocks that are treated separately and merged afterwards.
Results are printed to a Pickle binary file, so they can be checked using: python -mpickle result.txt
This example also shows how to manually add input or output datasets to the workflow provenance recording (using the 'input' and 'output' terms in the ro-crate-info.yaml file).
Execution instructions
Usage:
runcompss --lang=python $(pwd)/application_sources/src/wordcount_blocks.py filePath resultPath blockSize
where:
- filePath: Absolute path of the file to parse
- resultPath: Absolute path to the result file
- blockSize: Size of each block. The lower the number, the more tasks will be generated in the workflow
Execution Examples
runcompss --lang=python $(pwd)/application_sources/src/wordcount_blocks.py $(pwd)/dataset/data/compss.txt result.txt 300
runcompss $(pwd)/application_sources/src/wordcount_blocks.py $(pwd)/dataset/data/compss.txt result.txt 300
python -m pycompss $(pwd)/application_sources/src/wordcount.py $(pwd)/dataset/data/compss.txt result.txt 300
Build
No build is required
Version History
COMPSs 3.3 (earliest) Created 15th Dec 2023 at 14:57 by Raül Sirvent
Run using COMPSs 3.3 version at Marenostrum IV supercomputing, using 1 node (48 cores).
Frozen
COMPSs-3.3
4d9de37

Creator
Additional credit
The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)
Submitter
Views: 3419 Downloads: 488
Created: 15th Dec 2023 at 14:57

None