For more than 3000 contracts, for each I have data of billed sum and payid sum. Payments usually != billed sum, there are can be multiple payments for one billed sum, can be some period without payments at all. For example some piece of data for a contract:
| date | type of operation | sum (billed) | sum (paid) |
| ------------ | ------------ | ------------ | ------------ |
| 30.06.2021 | billed | 6919,07 | |
| 31.07.2021 | billed | 4829,65 | |
| 12.08.2021 | paid | | 3000 |
| 31.08.2021 | billed | 3845,6 | |
| 05.09.2021 | paid | | 10000 |
It's needed to calculate average debt term for a contract, it means the time in days in between billed sum and its full payment. For instance, 6919,07 billed on 30.06.2021 fully paid on 05.09.2021 and for it's billed sum are two debt terms:
- 3000 - debt term 43 days (remain part 3919,07) payd on 12.08.2021
- 3919,07 - debt term 67 days
remain part (3 080,93) of payment made on 05.09.2021 (10000) should be used for sum billed on 31.07.2021
The main goal - calculate over all contracts sum of debt < 1 month, 1-2 months, 2-5 month and more than 6 month, and calculate it every day to see in dynamic.
I only can imagined very complicated algoritm, with creation 2 addition tables for payment and debts like stack where stored only partly "used" sum, and also add flags for source tables to indicate if sum were fully paid. May be it's better to use trees or another kind of data structures. But my "algoritm" is very complicated for programming and debuging. Please help me to find more pithonic way to solve the issue.
1条答案
按热度按时间kmpatx3s1#
我在这里只展示如何使用Python的标准库来检索所有账单债务的全额支付日期,以便更好地展示这是如何工作的。使用panda这样的库可以更快地完成。这里的日期是字符串,但数据类型对这段代码来说并不重要。
其思想是简单地为到某个日期为止的账单的累计总额找到累计付款总额大于或等于的第一个日期。
开单事件和付款事件的单独表(在此处存储为列列表):
我们需要帐单金额和已付金额的累计总和(作为新列附加到表中:
查找并添加全额支付的日期。由于已付债务的累计总额正在增加(按顺序),因此可以使用平分算法:
billed
最后包含累积和(不再需要)以及每张账单全额支付日期的附加列: