CXCSCMU GroupWiki: Difference between revisions
| (39 intermediate revisions by 13 users not shown) | |||
| Line 1: | Line 1: | ||
== | == LTI Babel Cluster == | ||
For onboarding students, please carefully read the instruction [https://docs.google.com/presentation/d/1AgyKU72PrZ5O2JVbNB-7t6kI1C35UrQjVLZPSLuPv7k/edit?usp=sharing slides]. | |||
== Organize your code == | |||
Create your '''own''' branch for development and '''regularly''' submit changes to your branch! Here are some development steps you may refer to: | Create your '''own''' branch for development and '''regularly''' submit changes to your branch! Here are some development steps you may refer to: | ||
| Line 9: | Line 10: | ||
# If you think these features can benefit everyone in the group, merge them into the main branch by pull request (peers should do code review and test) | # If you think these features can benefit everyone in the group, merge them into the main branch by pull request (peers should do code review and test) | ||
== Organize your results == | |||
=== Loss curve === | |||
If you | If you train the model, the loss curve is vital in debugging, providing insights, and reproducing your results. | ||
It is recommended to use wandb to show your loss curve: | It is recommended to use [https://wandb.ai/zhiyuan-chenyan-zhenghao-group?shareProfileType=copy wandb] to show your loss curve: | ||
""" | |||
Wandb setup | |||
""" | |||
wandb_project = YOUR_PROJECT_NAME | |||
# IMPORTANT! Record the most important hyperparameters here | |||
wandb_run_name = MODEL_NAME-DATASET_NAME-BATCH_SIZE-LEARNING_RATE-YOUR_RUN_NAME | |||
# Optional but highly recommended | |||
hparams = YOUR_HYPER_PARAMS_DICT | |||
out_dir = YOUR_WANDB_OUTPUT_DIR | |||
# You may be asked to fill in the API key. Find it online | |||
wandb.init(project=wandb_project, name=wandb_run_name, config=hparams, dir=out_dir) | |||
""" | |||
Wandb logging | |||
""" | |||
wandb.log({ | |||
"step": train_step, | |||
"train/loss": train_loss, | |||
"val/loss": val_loss, | |||
"step time": step_time, | |||
"lr": lr, | |||
}) | }) | ||
When you are training, you can keep track of the curve online and export the reports to share at the end of the training. | When you are training, you can keep track of the curve online and export the reports to share at the end of the training. For your reference, this is an [https://wandb.ai/zhiyuan-chenyan-zhenghao-group/Efficient%20Pre-train/reports/Pythia-160M-Pre-trained-with-ClueWeb22--Vmlldzo1NTg3NDc4 example report]. | ||
=== Evaluation numbers === | |||
Create the folder named by your name/project name under the Google Drive [https://drive.google.com/drive/folders/ | Create the folder named by your name/project name under the Google Drive [https://drive.google.com/drive/folders/1idKTArwYoInnC_gRaag8nPZ6JEqx86Mm?usp=share_link CXCSCMU_Group] and use Google Sheets to illustrate the numbers. | ||
* Make sure the column/row name is clear to read about the details of the model you are actually evaluating | * Make sure the column/row name is clear to read about the details of the model you are actually evaluating | ||
| Line 43: | Line 53: | ||
* Group the model sets that can be compared fairly in the same format as the research paper | * Group the model sets that can be compared fairly in the same format as the research paper | ||
== Codebases == | |||
=== [https://github.com/cxcscmu/Lightning-Pretrain Lightning-Pretrain] === | |||
An LLM codebase built on Lit-GPT and Pytorch Lightning, which is especially useful for efficient pre-training LM from scratch. | |||
'''What can it do:''' | |||
* Pre-train state-of-the-art decoder-only models (Llama, Llama 2, Pythia, Vicuna, GPT-2, ...) | |||
* Fine-tune using task-specific data | |||
* Evaluation on [https://github.com/EleutherAI/lm-evaluation-harness Language Model Evaluation Harness] | |||
'''Pros:''' | |||
* State-of-the-art distributed training strategies: DDP, FSDP, DeepSpeed | |||
* Modern acceleration strategies: FlashAttention, Fused Adam, mixed precision | |||
* Parameter-efficient fine-tune: Adapter, Adapter v2, LoRA, QLoRA, ... | |||
* Large-scale evaluation datasets: almost cover every common task in NLP and keep updating | |||
* Comparable training speed with huggingface but better flexibility | |||
* Relatively easy to convert the model weights from/to huggingface: name mapping | |||
* Detailed [https://github.com/cxcscmu/Lightning-Pretrain/tree/main/tutorials tutorials] for each usage, and it is pretty easy to begin with | |||
'''Cons:''' | |||
* Does not support models in other structures such as T5, BERT | |||
* Does not support as many training datasets as huggingface, you may need to define the dataset class or preprocess the dataset by yourself | |||
* Still in development and requires everyone's effort to maintain it | |||
=== [https://github.com/cxcscmu/OpenMatch OpenMatch V2.0] === | |||
(Yu et al., 2022) [https://dl.acm.org/doi/abs/10.1145/3539618.3591813 Paper] | [https://github.com/OpenMatch/OpenMatch Github] | [https://openmatch.readthedocs.io/en/latest/ Docs] | |||
A Python-based library for conducting Neural Information Retrieval (Neu-IR) research experiments. This library contains both neural and traditional IR modules, making it easy to conduct baselines experiments for comparison. | |||
'''What can it do:''' | |||
* ''Template-based Data Processing.'' Convenient templates for processing of raw data -- no need to reformat data to conform to software's input. | |||
* ''Efficient Data Accessing''. Integrated with HF datasets which enables access to large dataset with minimal memory overhead. | |||
* ''Sharded Search''. Implements two-stage sharded search which bypasses the need to load whole datasets into a single memory. | |||
* A sample of models that are supported: DPR, ANCE, T5, BERT, etc. | |||
'''Requirements:''' | |||
* Pytorch | |||
* HuggingFace, Datasets | |||
* Transformers | |||
* Faiss | |||
A working directory of OpenMatch V2.0 exists in `/data/group_data/cx_group/OpenMatch` on the Babel server. | |||
== Paper reading == | |||
Research assistants' default minimum of paper reading is five per week. For your reference, this is an [https://dolomite-marquis-d9f.notion.site/39b0efb8d4be45a1a45a22e3a2188c8f?v=0322d76cf09444f5bec479e17ce5bda6 example]. | |||
== Paper spotlight meetings == | |||
=== Meeting 1 - October 6, 2023 === | |||
{| class="wikitable" | |||
|+ | |||
!Paper | |||
!#Votes | |||
|- | |||
|[https://openreview.net/pdf?id=WgbcOQMNXB Large Language Models Are Not Zero-Shot Communicators] | |||
|2 | |||
|- | |||
|[https://openreview.net/pdf?id=c4m0BkO4OL Towards Structured Sparsity in Transformers for Efficient Inference] | |||
|3 | |||
|- | |||
|[https://arxiv.org/pdf/2305.02869.pdf 2x Faster Language Model Pre-training via Masked Structural Growth] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2309.17453.pdf Efficient Streaming Language Models with Attention Sinks] | |||
|1 | |||
|- | |||
|[https://arxiv.org/abs/2308.07922 RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models] | |||
|0 | |||
|- | |||
|[https://arxiv.org/pdf/2309.11495.pdf Chain-of-Verification Reduces Hallucination in Large Language Models] | |||
|3 | |||
|- | |||
|[https://arxiv.org/abs/2304.13060 Pre-train on just structure: Understanding linguistic inductive biases using transfer learning] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2302.06675.pdf Symbolic Discovery of Optimization Algorithms] | |||
|0 | |||
|- | |||
|[https://arxiv.org/pdf/2204.00185.pdf Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings] | |||
|3 | |||
|- | |||
|[https://arxiv.org/abs/2206.02743 A Neural Corpus Indexer for Document Retrieval] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2309.15088.pdf RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2309.07124.pdf RAIN: Your Language Models Can Align Themselves without Finetuning] | |||
|0 | |||
|} | |||
=== Meeting 2 - November 3, 2023 === | |||
{| class="wikitable" | |||
|+ | |||
!Paper | |||
!#Votes | |||
|- | |||
|[https://arxiv.org/pdf/2310.11511.pdf SELF-RAG: Learning to Retrieve, Generate and Critique through Self-reflection] | |||
|4 | |||
|- | |||
|[https://arxiv.org/pdf/2310.11716.pdf Reflection-Tuning:Data Recycling Improves LLM Instruction-Tuning] | |||
|2 | |||
|- | |||
|[https://arxiv.org/abs/2310.14034 Tree Prompting: Efficient Task Adaptation without Fine-Tuning] | |||
|2 | |||
|- | |||
|[https://arxiv.org/abs/2306.04488 Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2309.12307.pdf LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2304.15004.pdf Are Emergent Abilities of Large Language Models a Mirage?] | |||
|2 | |||
|- | |||
|[https://arxiv.org/abs/2205.14135 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2310.11451.pdf Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2301.13808.pdf Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning] | |||
|1 | |||
|- | |||
|[https://aclanthology.org/2021.acl-long.568/ Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2310.17680v1.pdf CodeFusion: A Pre-trained Diffusion Model for Code Generation] | |||
|0 | |||
|- | |||
|[https://arxiv.org/abs/2309.08532 Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers] | |||
|0 | |||
|- | |||
|[https://openreview.net/pdf?id=9k27IITeAZ ChunkAttention: Efficient Attention on KV Cache with Chunking Sharing and Batching] | |||
|0 | |||
|- | |||
|[https://arxiv.org/abs/2310.05029 Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading] | |||
|0 | |||
|} | |||
=== Meeting 3 - December 1, 2023 === | |||
{| class="wikitable" | |||
!Paper | |||
!#Votes | |||
|- | |||
|[https://arxiv.org/abs/2112.08633 Learning To Retrieve Prompts for In-Context Learning] | |||
|3 | |||
|- | |||
|[https://arxiv.org/pdf/2311.03099.pdf Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch] | |||
|2 | |||
|- | |||
|[https://arxiv.org/abs/2310.10638 In-Context Pretraining: Language Modeling Beyond Document Boundaries] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2311.11829.pdf System 2 Attention] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2311.15436.pdf Learning to Skip for Language Modeling] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2306.17842.pdf SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs] | |||
|2 | |||
|- | |||
|[https://openaccess.thecvf.com/content/ICCV2023/papers/Jin_Growing_a_Brain_with_Sparsity-Inducing_Generation_for_Continual_Learning_ICCV_2023_paper.pdf Growing a Brain with Sparsity-Inducing Generation for Continual Learning] | |||
|2 | |||
|- | |||
|[https://arxiv.org/abs/1911.00172 Generalization through Memorization: Nearest Neighbor Language Models] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2309.08872.pdf PDFTriage: Question Answering over Long, Structured Documents] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2311.01906.pdf Simplifying Transformer Blocks] | |||
|1 | |||
|- | |||
|[https://arxiv.org/abs/2311.03348 Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2311.05965v1.pdf Large Language Models are Zero Shot Hypothesis Proposers] | |||
|0 | |||
|- | |||
|[https://arxiv.org/pdf/2303.17605.pdf SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer] | |||
|0 | |||
|- | |||
|[https://arxiv.org/abs/2310.07298 Beyond Memorization: Violating Privacy Via Inference with Large Language Models] | |||
|0 | |||
|} | |||
=== Meeting 4 - February 2, 2024 === | |||
{| class="wikitable" | |||
!Paper | |||
!#Votes | |||
|- | |||
|[https://arxiv.org/pdf/2401.08565.pdf Tuning Language Models by Proxy] | |||
|5 | |||
|- | |||
|[https://arxiv.org/pdf/2304.04171.pdf Learning to Tokenize for Generative Retrieval] | |||
|4 | |||
|- | |||
|[https://arxiv.org/pdf/2401.13275.pdf Can AI Assistants Know What They Don’t Know?] | |||
|3 | |||
|- | |||
|[https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf Lost in the Middle: How Language Models Use Long Contexts] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2305.09785.pdf Distilling Semantic Concept Embeddings from Contrastively Fine-tuned Language Models] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2202.00622.pdf Datamodels: Predicting Predictions from Training Data] | |||
|2 | |||
|- | |||
|[https://arxiv.org/pdf/2401.04858.pdf User Embedding Model for Personalized Language Prompting] | |||
|1 | |||
|- | |||
|[https://arxiv.org/abs/2204.07496 Improving Passage Retrieval with Zero-Shot Question Generation] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2305.14583.pdf Text Representation Distillation via Information Bottleneck Principle] | |||
|1 | |||
|- | |||
|[https://arxiv.org/pdf/2401.10020.pdf Self-Rewarding Language Models] | |||
|0 | |||
|- | |||
|[https://openreview.net/forum?id=oXYZJXDdo7 Retrieval is Accurate Generation] | |||
|0 | |||
|- | |||
|[https://dl.acm.org/doi/abs/10.1145/3539618.3591834 Learning Query-aware Embedding Index for Improving E-commerce Dense Retrieval] | |||
|0 | |||
|- | |||
|[https://openreview.net/pdf?id=FG8b2I2AkF How to Prune Your Language Model: Recovering Accuracy on the ``Sparsity May Cry<nowiki>''</nowiki> Benchmark] | |||
|0 | |||
|- | |||
|[https://arxiv.org/pdf/2010.09030.pdf Explaining and Improving Model Behavior with k Nearest Neighbor Representations] | |||
|0 | |||
|} | |||
== Building from open source == | |||
A formalization of the process of building from open-source repositories. | |||
=== Incremental validation === | |||
Conducting incremental validation of an open-source repository is highly recommended so that we can be sure that all components are performing as expected. | |||
* Evaluate the performance of outputs | |||
** Evaluate quickly published outputs and check if performance agrees with what they claim. | |||
** For example, an “output” refers to the predicted rank in a retrieval task on a dev set. | |||
* Evaluate performance on published artefacts | |||
** For example, if trained embeddings are available, validate performance on that using a prediction step of the downloaded trained model. | |||
** Unlikely to be available due to large file sizes (see: inference step). | |||
* Run inference (forward) pass | |||
** To obtain predictions on downstream task | |||
** Follow open-source instructions for this step. | |||
** Check if performance agrees with performance claims. | |||
* Run training | |||
** Required if you are further fine-tuning the model (or parts thereof) | |||
** Follow up with inference | |||
=== Mitigation / Debugs when validation fails === | |||
Some reasons why code doesn't work and possible solutions. | |||
* Code is not written to open-source standards | |||
** Code may have worked for authors, but it is not generally usable | |||
** Sometimes fixes are easy – check-in/out directories | |||
* Code is from a while ago | |||
** Check package versions that authors use, and replicate in venv if possible | |||
** If intending to fine-tune model for future use, you probably want to update the code -- see documentation on any breaking updates | |||
* Missing components needed | |||
** Probably the authors also based their project off someone else’s – can do some digging to see if the missing component is anywhere else on the internet | |||
* Everything runs without error, but results are just not replicable 😡 | |||
** Check data processing (lookout for corrupted / truncated files) | |||
** Check model parameters (especially optimization parameters) | |||
** Check that “default” parameters in args correspond to optimal values they report in the paper | |||
Always a good idea to check github issues and see if you problem is a "known" problem, and if there are any workarounds. | |||
== Debugging deep learning experiments == | |||
A step-by-step guide on debugging deep learning experiments | |||
=== Step-by-step guide === | |||
* Reduce as many external factors as possible and try to localize the error/issue. Factors to reduce include | |||
** Number of GPUs used in training | |||
** Dataset size | |||
** Dataset complexity (toy dataset) | |||
* Check both the input and output of all functions within the experiment. Do not assume you know what a function is doing | |||
* Prioritize rapid debugging | |||
** Reduce training steps | |||
** Log output more frequently | |||
* Avoid starting from scratch, leverage open-source implementation | |||
* Write test cases. | |||
* Avoid multi-processing until the issues is identified | |||
Latest revision as of 17:36, 9 June 2024
LTI Babel Cluster
[edit | edit source]For onboarding students, please carefully read the instruction slides.
Organize your code
[edit | edit source]Create your own branch for development and regularly submit changes to your branch! Here are some development steps you may refer to:
- For new features/experiments, it is highly recommended to create a new branch
- After developing and thorough test, merge the new features to your branch
- If you think these features can benefit everyone in the group, merge them into the main branch by pull request (peers should do code review and test)
Organize your results
[edit | edit source]Loss curve
[edit | edit source]If you train the model, the loss curve is vital in debugging, providing insights, and reproducing your results.
It is recommended to use wandb to show your loss curve:
""" Wandb setup """ wandb_project = YOUR_PROJECT_NAME # IMPORTANT! Record the most important hyperparameters here wandb_run_name = MODEL_NAME-DATASET_NAME-BATCH_SIZE-LEARNING_RATE-YOUR_RUN_NAME # Optional but highly recommended hparams = YOUR_HYPER_PARAMS_DICT out_dir = YOUR_WANDB_OUTPUT_DIR # You may be asked to fill in the API key. Find it online wandb.init(project=wandb_project, name=wandb_run_name, config=hparams, dir=out_dir)
"""
Wandb logging
"""
wandb.log({
"step": train_step,
"train/loss": train_loss,
"val/loss": val_loss,
"step time": step_time,
"lr": lr,
})
When you are training, you can keep track of the curve online and export the reports to share at the end of the training. For your reference, this is an example report.
Evaluation numbers
[edit | edit source]Create the folder named by your name/project name under the Google Drive CXCSCMU_Group and use Google Sheets to illustrate the numbers.
- Make sure the column/row name is clear to read about the details of the model you are actually evaluating
- ❌ Pythia
- ✅ Pythia-160M full-model fine-tuned with SST-2 for1 epoch, lr=1e-5, bs=8
- Group the model sets that can be compared fairly in the same format as the research paper
Codebases
[edit | edit source]An LLM codebase built on Lit-GPT and Pytorch Lightning, which is especially useful for efficient pre-training LM from scratch.
What can it do:
- Pre-train state-of-the-art decoder-only models (Llama, Llama 2, Pythia, Vicuna, GPT-2, ...)
- Fine-tune using task-specific data
- Evaluation on Language Model Evaluation Harness
Pros:
- State-of-the-art distributed training strategies: DDP, FSDP, DeepSpeed
- Modern acceleration strategies: FlashAttention, Fused Adam, mixed precision
- Parameter-efficient fine-tune: Adapter, Adapter v2, LoRA, QLoRA, ...
- Large-scale evaluation datasets: almost cover every common task in NLP and keep updating
- Comparable training speed with huggingface but better flexibility
- Relatively easy to convert the model weights from/to huggingface: name mapping
- Detailed tutorials for each usage, and it is pretty easy to begin with
Cons:
- Does not support models in other structures such as T5, BERT
- Does not support as many training datasets as huggingface, you may need to define the dataset class or preprocess the dataset by yourself
- Still in development and requires everyone's effort to maintain it
(Yu et al., 2022) Paper | Github | Docs
A Python-based library for conducting Neural Information Retrieval (Neu-IR) research experiments. This library contains both neural and traditional IR modules, making it easy to conduct baselines experiments for comparison.
What can it do:
- Template-based Data Processing. Convenient templates for processing of raw data -- no need to reformat data to conform to software's input.
- Efficient Data Accessing. Integrated with HF datasets which enables access to large dataset with minimal memory overhead.
- Sharded Search. Implements two-stage sharded search which bypasses the need to load whole datasets into a single memory.
- A sample of models that are supported: DPR, ANCE, T5, BERT, etc.
Requirements:
- Pytorch
- HuggingFace, Datasets
- Transformers
- Faiss
A working directory of OpenMatch V2.0 exists in `/data/group_data/cx_group/OpenMatch` on the Babel server.
Paper reading
[edit | edit source]Research assistants' default minimum of paper reading is five per week. For your reference, this is an example.
Paper spotlight meetings
[edit | edit source]Meeting 1 - October 6, 2023
[edit | edit source]Meeting 2 - November 3, 2023
[edit | edit source]Meeting 3 - December 1, 2023
[edit | edit source]Meeting 4 - February 2, 2024
[edit | edit source]Building from open source
[edit | edit source]A formalization of the process of building from open-source repositories.
Incremental validation
[edit | edit source]Conducting incremental validation of an open-source repository is highly recommended so that we can be sure that all components are performing as expected.
- Evaluate the performance of outputs
- Evaluate quickly published outputs and check if performance agrees with what they claim.
- For example, an “output” refers to the predicted rank in a retrieval task on a dev set.
- Evaluate performance on published artefacts
- For example, if trained embeddings are available, validate performance on that using a prediction step of the downloaded trained model.
- Unlikely to be available due to large file sizes (see: inference step).
- Run inference (forward) pass
- To obtain predictions on downstream task
- Follow open-source instructions for this step.
- Check if performance agrees with performance claims.
- Run training
- Required if you are further fine-tuning the model (or parts thereof)
- Follow up with inference
Mitigation / Debugs when validation fails
[edit | edit source]Some reasons why code doesn't work and possible solutions.
- Code is not written to open-source standards
- Code may have worked for authors, but it is not generally usable
- Sometimes fixes are easy – check-in/out directories
- Code is from a while ago
- Check package versions that authors use, and replicate in venv if possible
- If intending to fine-tune model for future use, you probably want to update the code -- see documentation on any breaking updates
- Missing components needed
- Probably the authors also based their project off someone else’s – can do some digging to see if the missing component is anywhere else on the internet
- Everything runs without error, but results are just not replicable 😡
- Check data processing (lookout for corrupted / truncated files)
- Check model parameters (especially optimization parameters)
- Check that “default” parameters in args correspond to optimal values they report in the paper
Always a good idea to check github issues and see if you problem is a "known" problem, and if there are any workarounds.
Debugging deep learning experiments
[edit | edit source]A step-by-step guide on debugging deep learning experiments
Step-by-step guide
[edit | edit source]- Reduce as many external factors as possible and try to localize the error/issue. Factors to reduce include
- Number of GPUs used in training
- Dataset size
- Dataset complexity (toy dataset)
- Check both the input and output of all functions within the experiment. Do not assume you know what a function is doing
- Prioritize rapid debugging
- Reduce training steps
- Log output more frequently
- Avoid starting from scratch, leverage open-source implementation
- Write test cases.
- Avoid multi-processing until the issues is identified