Workflow Type: Galaxy
Open
A workflow to show how material from DaSch can be processed in Galaxy. The example used is a optical character recognition of a German newspaper from DaSch which will be made machine-readable, cleaned, stripped of punctuation and visualised in a Wordcloud.
Inputs
| ID | Name | Description | Type |
|---|---|---|---|
| Input Image | Input Image | Upload of images to make OCR readable |
|
| Upload Stopwords | Upload Stopwords | n/a |
|
Steps
| ID | Name | Description |
|---|---|---|
| 2 | Tesseract | toolshed.g2.bx.psu.edu/repos/iuc/tesseract/tesseract/5.5.1+galaxy0 |
| 3 | Text cleaning | Remove single items shown at the beginning or end of the line toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.3 |
| 4 | Remove Punctuation for later Visualisation | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy2 |
| 5 | Generate a word cloud | toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.4+galaxy3 |
Outputs
| ID | Name | Description | Type |
|---|---|---|---|
| out_file1 | out_file1 | n/a |
|
| output | output | n/a |
|
Version History
Version 1 (earliest) Created 6th Nov 2025 at 18:18 by Łukasz Opioła
Initial commit
Open
master
e756f4a
Creators and SubmitterCreators
Submitter
Activity
Views: 36 Downloads: 6 Runs: 0
Created: 6th Nov 2025 at 18:18
AttributionsNone
Run on Galaxy
https://orcid.org/0000-0003-1997-932X