OCR Test DaSch
Version 1

Workflow Type: Galaxy

A workflow to show how material from DaSch can be processed in Galaxy. The example used is a optical character recognition of a German newspaper from DaSch which will be made machine-readable, cleaned, stripped of punctuation and visualised in a Wordcloud.

Inputs

ID Name Description Type
Input Image Input Image Upload of images to make OCR readable
  • File
Upload Stopwords Upload Stopwords n/a
  • File

Steps

ID Name Description
2 Tesseract toolshed.g2.bx.psu.edu/repos/iuc/tesseract/tesseract/5.5.1+galaxy0
3 Text cleaning Remove single items shown at the beginning or end of the line toolshed.g2.bx.psu.edu/repos/galaxyp/regex_find_replace/regex1/1.0.3
4 Remove Punctuation for later Visualisation toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_replace_in_line/9.5+galaxy2
5 Generate a word cloud toolshed.g2.bx.psu.edu/repos/bgruening/wordcloud/wordcloud/1.9.4+galaxy3

Outputs

ID Name Description Type
out_file1 out_file1 n/a
  • File
output output n/a
  • File

Version History

Version 1 (earliest) Created 6th Nov 2025 at 18:18 by Łukasz Opioła

Initial commit


Open master e756f4a
help Creators and Submitter
Creators
  • Johannes Nussbaum
  • Daniela Schneider
Submitter
Activity

Views: 36   Downloads: 6   Runs: 0

Created: 6th Nov 2025 at 18:18

help Tags
help Attributions

None

Total size: 10.7 KB
Powered by
(v.1.17.1)
Copyright © 2008 - 2025 The University of Manchester and HITS gGmbH