Skip to content

[BUG] Operator Remove Extra spaces producing more data than needed? #290

@joule19

Description

@joule19

After running the Remove extra spaces operator the output dataset consists of 1500 text-plain files, while the input dataset was only 1225 input files. Why? In theory, the operator should only remove space characters at the start and at the end of the file.

Image Image
[INFO]
2026-01-26 22:59:54.625 | INFO | datamate.wrappers.executor:__init__:32 - Initing Ray ...
[WARNING]
2026-01-26 22:59:57,844 WARNING services.py:2137 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67076096 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=4.72gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
[INFO]
2026-01-26 22:59:59,989 INFO worker.py:1998 -- Started a local Ray instance. View the dashboard at �[1m�[32mhttp://127.0.0.1:8265 �[39m�[22m
[WARN]
/usr/local/lib/python3.11/site-packages/ray/_private/worker.py:2046: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
[WARN]
warnings.warn(
[INFO]
2026-01-26 23:00:00.954 | INFO | __main__:run:30 - Loading dataset with Ray...
[INFO]
2026-01-26 23:00:02.112 | INFO | __main__:run:41 - Processing data...
[INFO]
2026-01-26 23:00:02.122 | INFO | datamate.core.dataset:load_ops_module:146 - Import Ops module ExtraSpaceCleaner Success.
[INFO]
2026-01-26 23:00:02.124 | INFO | __main__:run:45 - All Ops are done in 0.012s.
[INFO]
2026-01-26 23:00:02,136 INFO logging.py:397 -- Registered dataset logger for dataset dataset_2_0
[INFO]
2026-01-26 23:00:02,152 INFO streaming_executor.py:178 -- Starting execution of Dataset dataset_2_0. Full logs are in /tmp/ray/session_2026-01-26_22-59-54_648514_10746/logs/ray-data
[INFO]
2026-01-26 23:00:02,153 INFO streaming_executor.py:179 -- Execution plan of Dataset dataset_2_0: InputDataBuffer[Input] -> ActorPoolMapOperator[MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)]
[INFO]
2026-01-26 23:00:02,193 INFO streaming_executor.py:686 -- [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
[WARNING]
2026-01-26 23:00:02,195 WARNING resource_manager.py:136 -- ⚠️ Ray's object store is configured to use only 42.9% of available memory (4.3GiB out of 10.0GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
[WARNING]
2026-01-26 23:00:02,236 WARNING resource_manager.py:761 -- Cluster resources are not enough to run any task from ActorPoolMapOperator[MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)]. The job may hang forever unless the cluster scales up.
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:03.965 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000001.txt, method: ExtraSpaceCleaner costs 0.001191 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:03.966 | INFO | datamate.core.base_op:execute:473 - origin file named 000001.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000001.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:03.966 | INFO | datamate.core.base_op:execute:474 - fileName: 000001.txt, method: FileExporter costs 0.000948 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.038 | INFO | datamate.sql_manager.sql_manager:_get_engine:50 - Database Engine initialized successfully.
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.071 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000002.txt, method: ExtraSpaceCleaner costs 0.000734 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.072 | INFO | datamate.core.base_op:execute:473 - origin file named 000002.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000002.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.072 | INFO | datamate.core.base_op:execute:474 - fileName: 000002.txt, method: FileExporter costs 0.000330 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.078 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000003.txt, method: ExtraSpaceCleaner costs 0.000676 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.079 | INFO | datamate.core.base_op:execute:473 - origin file named 000003.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000003.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.079 | INFO | datamate.core.base_op:execute:474 - fileName: 000003.txt, method: FileExporter costs 0.000390 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.086 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000004.txt, method: ExtraSpaceCleaner costs 0.000665 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.086 | INFO | datamate.core.base_op:execute:473 - origin file named 000004.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000004.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.086 | INFO | datamate.core.base_op:execute:474 - fileName: 000004.txt, method: FileExporter costs 0.000247 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.092 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000005.txt, method: ExtraSpaceCleaner costs 0.000804 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.093 | INFO | datamate.core.base_op:execute:473 - origin file named 000005.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000005.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.093 | INFO | datamate.core.base_op:execute:474 - fileName: 000005.txt, method: FileExporter costs 0.000292 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.099 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000006.txt, method: ExtraSpaceCleaner costs 0.000880 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.100 | INFO | datamate.core.base_op:execute:473 - origin file named 000006.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000006.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.100 | INFO | datamate.core.base_op:execute:474 - fileName: 000006.txt, method: FileExporter costs 0.000281 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.107 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000007.txt, method: ExtraSpaceCleaner costs 0.001991 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.108 | INFO | datamate.core.base_op:execute:473 - origin file named 000007.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000007.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.108 | INFO | datamate.core.base_op:execute:474 - fileName: 000007.txt, method: FileExporter costs 0.000301 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.114 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000008.txt, method: ExtraSpaceCleaner costs 0.000635 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.114 | INFO | datamate.core.base_op:execute:473 - origin file named 000008.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000008.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.114 | INFO | datamate.core.base_op:execute:474 - fileName: 000008.txt, method: FileExporter costs 0.000326 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.129 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000009.txt, method: ExtraSpaceCleaner costs 0.000471 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.129 | INFO | datamate.core.base_op:execute:473 - origin file named 000009.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000009.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.129 | INFO | datamate.core.base_op:execute:474 - fileName: 000009.txt, method: FileExporter costs 0.000409 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.135 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000010.txt, method: ExtraSpaceCleaner costs 0.000652 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.135 | INFO | datamate.core.base_op:execute:473 - origin file named 000010.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000010.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.135 | INFO | datamate.core.base_op:execute:474 - fileName: 000010.txt, method: FileExporter costs 0.000209 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.141 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000011.txt, method: ExtraSpaceCleaner costs 0.000791 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.142 | INFO | datamate.core.base_op:execute:473 - origin file named 000011.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000011.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.142 | INFO | datamate.core.base_op:execute:474 - fileName: 000011.txt, method: FileExporter costs 0.000226 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.147 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000012.txt, method: ExtraSpaceCleaner costs 0.000603 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.147 | INFO | datamate.core.base_op:execute:473 - origin file named 000012.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000012.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.148 | INFO | datamate.core.base_op:execute:474 - fileName: 000012.txt, method: FileExporter costs 0.000225 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.153 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000013.txt, method: ExtraSpaceCleaner costs 0.000658 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.153 | INFO | datamate.core.base_op:execute:473 - origin file named 000013.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000013.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.153 | INFO | datamate.core.base_op:execute:474 - fileName: 000013.txt, method: FileExporter costs 0.000216 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.159 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000014.txt, method: ExtraSpaceCleaner costs 0.000573 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.159 | INFO | datamate.core.base_op:execute:473 - origin file named 000014.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000014.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.159 | INFO | datamate.core.base_op:execute:474 - fileName: 000014.txt, method: FileExporter costs 0.000216 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.165 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000015.txt, method: ExtraSpaceCleaner costs 0.000508 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.165 | INFO | datamate.core.base_op:execute:473 - origin file named 000015.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000015.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.165 | INFO | datamate.core.base_op:execute:474 - fileName: 000015.txt, method: FileExporter costs 0.000312 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.171 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000016.txt, method: ExtraSpaceCleaner costs 0.000554 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.171 | INFO | datamate.core.base_op:execute:473 - origin file named 000016.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000016.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.171 | INFO | datamate.core.base_op:execute:474 - fileName: 000016.txt, method: FileExporter costs 0.000314 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.206 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000017.txt, method: ExtraSpaceCleaner costs 0.000717 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.206 | INFO | datamate.core.base_op:execute:473 - origin file named 000017.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000017.txt
...
formation
Processing operators
Processing files
Run log
Select the run round:

1st time​
Current display: 1st time
[INFO]
2026-01-26 22:59:54.625 | INFO | datamate.wrappers.executor:__init__:32 - Initing Ray ...
[WARNING]
2026-01-26 22:59:57,844 WARNING services.py:2137 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67076096 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=4.72gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
[INFO]
2026-01-26 22:59:59,989 INFO worker.py:1998 -- Started a local Ray instance. View the dashboard at �[1m�[32mhttp://127.0.0.1:8265 �[39m�[22m
[WARN]
/usr/local/lib/python3.11/site-packages/ray/_private/worker.py:2046: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
[WARN]
warnings.warn(
[INFO]
2026-01-26 23:00:00.954 | INFO | __main__:run:30 - Loading dataset with Ray...
[INFO]
2026-01-26 23:00:02.112 | INFO | __main__:run:41 - Processing data...
[INFO]
2026-01-26 23:00:02.122 | INFO | datamate.core.dataset:load_ops_module:146 - Import Ops module ExtraSpaceCleaner Success.
[INFO]
2026-01-26 23:00:02.124 | INFO | __main__:run:45 - All Ops are done in 0.012s.
[INFO]
2026-01-26 23:00:02,136 INFO logging.py:397 -- Registered dataset logger for dataset dataset_2_0
[INFO]
2026-01-26 23:00:02,152 INFO streaming_executor.py:178 -- Starting execution of Dataset dataset_2_0. Full logs are in /tmp/ray/session_2026-01-26_22-59-54_648514_10746/logs/ray-data
[INFO]
2026-01-26 23:00:02,153 INFO streaming_executor.py:179 -- Execution plan of Dataset dataset_2_0: InputDataBuffer[Input] -> ActorPoolMapOperator[MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)]
[INFO]
2026-01-26 23:00:02,193 INFO streaming_executor.py:686 -- [dataset]: A new progress UI is available. To enable, set `ray.data.DataContext.get_current().enable_rich_progress_bars = True` and `ray.data.DataContext.get_current().use_ray_tqdm = False`.
[WARNING]
2026-01-26 23:00:02,195 WARNING resource_manager.py:136 -- ⚠️ Ray's object store is configured to use only 42.9% of available memory (4.3GiB out of 10.0GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
[WARNING]
2026-01-26 23:00:02,236 WARNING resource_manager.py:761 -- Cluster resources are not enough to run any task from ActorPoolMapOperator[MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)]. The job may hang forever unless the cluster scales up.
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:03.965 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000001.txt, method: ExtraSpaceCleaner costs 0.001191 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:03.966 | INFO | datamate.core.base_op:execute:473 - origin file named 000001.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000001.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:03.966 | INFO | datamate.core.base_op:execute:474 - fileName: 000001.txt, method: FileExporter costs 0.000948 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.038 | INFO | datamate.sql_manager.sql_manager:_get_engine:50 - Database Engine initialized successfully.
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.071 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000002.txt, method: ExtraSpaceCleaner costs 0.000734 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.072 | INFO | datamate.core.base_op:execute:473 - origin file named 000002.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000002.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.072 | INFO | datamate.core.base_op:execute:474 - fileName: 000002.txt, method: FileExporter costs 0.000330 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.078 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000003.txt, method: ExtraSpaceCleaner costs 0.000676 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.079 | INFO | datamate.core.base_op:execute:473 - origin file named 000003.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000003.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.079 | INFO | datamate.core.base_op:execute:474 - fileName: 000003.txt, method: FileExporter costs 0.000390 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.086 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000004.txt, method: ExtraSpaceCleaner costs 0.000665 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.086 | INFO | datamate.core.base_op:execute:473 - origin file named 000004.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000004.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.086 | INFO | datamate.core.base_op:execute:474 - fileName: 000004.txt, method: FileExporter costs 0.000247 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.092 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000005.txt, method: ExtraSpaceCleaner costs 0.000804 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.093 | INFO | datamate.core.base_op:execute:473 - origin file named 000005.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000005.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.093 | INFO | datamate.core.base_op:execute:474 - fileName: 000005.txt, method: FileExporter costs 0.000292 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.099 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000006.txt, method: ExtraSpaceCleaner costs 0.000880 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.100 | INFO | datamate.core.base_op:execute:473 - origin file named 000006.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000006.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.100 | INFO | datamate.core.base_op:execute:474 - fileName: 000006.txt, method: FileExporter costs 0.000281 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.107 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000007.txt, method: ExtraSpaceCleaner costs 0.001991 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.108 | INFO | datamate.core.base_op:execute:473 - origin file named 000007.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000007.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.108 | INFO | datamate.core.base_op:execute:474 - fileName: 000007.txt, method: FileExporter costs 0.000301 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.114 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000008.txt, method: ExtraSpaceCleaner costs 0.000635 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.114 | INFO | datamate.core.base_op:execute:473 - origin file named 000008.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000008.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.114 | INFO | datamate.core.base_op:execute:474 - fileName: 000008.txt, method: FileExporter costs 0.000326 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.129 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000009.txt, method: ExtraSpaceCleaner costs 0.000471 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.129 | INFO | datamate.core.base_op:execute:473 - origin file named 000009.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000009.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.129 | INFO | datamate.core.base_op:execute:474 - fileName: 000009.txt, method: FileExporter costs 0.000409 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.135 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000010.txt, method: ExtraSpaceCleaner costs 0.000652 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.135 | INFO | datamate.core.base_op:execute:473 - origin file named 000010.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000010.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.135 | INFO | datamate.core.base_op:execute:474 - fileName: 000010.txt, method: FileExporter costs 0.000209 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.141 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000011.txt, method: ExtraSpaceCleaner costs 0.000791 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.142 | INFO | datamate.core.base_op:execute:473 - origin file named 000011.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000011.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.142 | INFO | datamate.core.base_op:execute:474 - fileName: 000011.txt, method: FileExporter costs 0.000226 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.147 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000012.txt, method: ExtraSpaceCleaner costs 0.000603 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.147 | INFO | datamate.core.base_op:execute:473 - origin file named 000012.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000012.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.148 | INFO | datamate.core.base_op:execute:474 - fileName: 000012.txt, method: FileExporter costs 0.000225 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.153 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000013.txt, method: ExtraSpaceCleaner costs 0.000658 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.153 | INFO | datamate.core.base_op:execute:473 - origin file named 000013.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000013.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.153 | INFO | datamate.core.base_op:execute:474 - fileName: 000013.txt, method: FileExporter costs 0.000216 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.159 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000014.txt, method: ExtraSpaceCleaner costs 0.000573 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.159 | INFO | datamate.core.base_op:execute:473 - origin file named 000014.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000014.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.159 | INFO | datamate.core.base_op:execute:474 - fileName: 000014.txt, method: FileExporter costs 0.000216 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.165 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000015.txt, method: ExtraSpaceCleaner costs 0.000508 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.165 | INFO | datamate.core.base_op:execute:473 - origin file named 000015.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000015.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.165 | INFO | datamate.core.base_op:execute:474 - fileName: 000015.txt, method: FileExporter costs 0.000312 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.171 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000016.txt, method: ExtraSpaceCleaner costs 0.000554 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.171 | INFO | datamate.core.base_op:execute:473 - origin file named 000016.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000016.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.171 | INFO | datamate.core.base_op:execute:474 - fileName: 000016.txt, method: FileExporter costs 0.000314 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.206 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000017.txt, method: ExtraSpaceCleaner costs 0.000717 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.206 | INFO | datamate.core.base_op:execute:473 - origin file named 000017.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000017.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.206 | INFO | datamate.core.base_op:execute:474 - fileName: 000017.txt, method: FileExporter costs 0.000247 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.214 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000018.txt, method: ExtraSpaceCleaner costs 0.000711 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.214 | INFO | datamate.core.base_op:execute:473 - origin file named 000018.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000018.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.214 | INFO | datamate.core.base_op:execute:474 - fileName: 000018.txt, method: FileExporter costs 0.000296 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.220 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000019.txt, method: ExtraSpaceCleaner costs 0.000471 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.220 | INFO | datamate.core.base_op:execute:473 - origin file named 000019.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000019.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.220 | INFO | datamate.core.base_op:execute:474 - fileName: 000019.txt, method: FileExporter costs 0.000301 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.226 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000020.txt, method: ExtraSpaceCleaner costs 0.000677 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.227 | INFO | datamate.core.base_op:execute:473 - origin file named 000020.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000020.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.227 | INFO | datamate.core.base_op:execute:474 - fileName: 000020.txt, method: FileExporter costs 0.000492 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.233 | INFO | ops.mapper.extra_space_cleaner.process:execute:46 - fileName: 000021.txt, method: ExtraSpaceCleaner costs 0.000601 s
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11429)�[0m 2026-01-26 23:00:04.233 | INFO | d
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11434)�[0m 2026-01-26 23:00:10.309 | INFO | datamate.core.base_op:execute:473 - origin file named 000979.txt has been save to /dataset/8ef99fc2-58c3-4967-90f4-5bf4f37287c9/000979.txt
[INFO]
�[36m(MapWorker(MapBatches(process_batch_arrow)->Map(ExtraSpaceCleaner)) pid=11434)�[0m 2026-01-26 23:00:10.310 | INFO | datamate.core.base_op:execute:474 - fileName: 000979.txt, method: FileExporter costs 0.000232 s
[INFO]
2026-01-26 23:00:10,385 INFO streaming_executor.py:304 -- ✔️ Dataset dataset_2_0 execution finished in 8.23 seconds
[INFO]
2026-01-26 23:00:10.551 | INFO | datamate.sql_manager.sql_manager:_get_engine:50 - Database Engine initialized successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions