langchain4j [功能] 添加语言代码分隔符

vyswwuz2  于 4个月前  发布在  其他
关注(0)|答案(4)|浏览(51)

您的功能请求是否与问题相关?请描述。

我想拆分源代码,例如Java、Python文件,langchain(Python)支持它:
https://python.langchain.com/v0.2/docs/how_to/code_splitter
但是我的服务是用Java编写的,所以我无法使用原始的Python langchain语言拆分器。

描述您希望的解决方案

支持用Java版本的代码拆分器。

ktecyv1j

ktecyv1j2#

我能试一下吗?

mzaanser

mzaanser3#

@Kugaaa sure, go ahead! What exactly do you plan to implement?

4dc9hkyq

4dc9hkyq4#

@Kugaaa sure, go ahead! What exactly do you plan to implement?
I have learned about the practices in LangChain, the essence of spliting code by TextSplitter
I would like to reference the approach used in LangChain to implement the abstract class HierarchicalDocumentSplitter

  • Implement DocumentByKeywordsSplitter , It will recursively split based on the given keyword list if it exceeds the set chunk size .
  • Implement DocumentByCodeSplitter , it extends DocumentByKeywordsSplitter and sets its keyword list to the relevant content related to the corresponding syntax.
  • There is also a mechanism for merging chunks in LangChain, and I would like to see if it can be implemented to merge the smaller pieces obtained through recursive.

相关问题