platform downloads

extract

tool to extract text from major document formats (namespace: extract)

supported formats

instantiate the class passing an extension as parameter.

var $extract : cs.extract.extract
$extract:=cs.extract.extract.new(".docx")

use cs.extract.formats to get the list of supported formats.

$extensions:=cs.extract.formats.new().extensions

there are 2 ways to invoke .getText(); synchronous and asynchronous.

synchronous: pass a single parameter and receive a collection of results in return.

$texts:=$extract.getText(${file: $file})

you can pass a single object or a collection of objects in a single call.

asynchronous: pass a second formula parameter. an empty collection is returned at this point.

the formula should have the following signature:

#DECLARE($worker : 4D.SystemWorker; $params : Object)

var $text : Text
$text:=$worker.response

[!TIP] whatever value you pass in data is returned in context

$extract.getText({file: $file.getContent(); data: $file}; Formula(onResponse))

#DECLARE($worker : 4D.SystemWorker; $params : Object)

var $text : Text
$text:=$worker.response
$file:=$params.context

use this to match input against output.

property	type	description
`file`	`4D.File` `4D.Blob` `Text`	input