pyspark 如何在dataframe中使用jinja模板

r8xiu3jd  于 2023-10-15  发布在  Spark
关注(0)|答案(2)|浏览(105)

我已经pyspark df如下:

FirstName  LastName  Score
Hello      World     [('Math', 90), ('Eng', 80)]
ABC        XYZ       [('Math', 90)]

Score只是Spark中的struct类型,如下所示:

[Row(sub='Math', score=90), Row(sub='Eng', score=80)]

我想把这个Score列作为Score_HTML。预期输出如下:

FirstName  LastName  Score_HTML
Hello      World     "<b>FullName:</b>Hello World <br><br> <table border="1"><tr><td>Sub</td><td>Score</td></tr><tr><td>Math</td><td>90</td></tr><tr><td>Eng</td><td>80</td></tr></table>"
ABC        XYZ       "<b>FullName:</b>ABC XYZ <br><br> <table border="1"><tr><td>Sub</td><td>Score</td></tr><tr><td>Math</td><td>90</td></tr></table>"

如何使用Jinja模板实现此功能。
我甚至尝试从Spark转换为Pandas DF,然后应用Jinja模板,如下所示:

import jinja2
template = environment.from_string(
   """
   <b>FullName:</b>{{ FirstName }} {{ LastName }} </br></br>
   <table border="1"><tr><td>Sub</td><td>Score</td></tr><tr>
   {% for value in df['Score'] %}
      <td>"{{ value['sub'] }}"</td><td>"{{ value['score'] }}"</td>
   {% endfor %}
   </tr></table>
   """
)

df['Score_HTML'] = template.render(FirstName=df['FirstName'], LastName=df['FirstName']) ???

需要帮助定义Jinja模板并在DF [SparkDF或PandasDF]中使用它来实现这一点。
先谢了。

am46iovg

am46iovg1#

我将使用mapInPandas方法来呈现jinja模板的每一行的Spark点阵

from jinja2 import Template

TEMPLATE =  """
<b>FullName:</b>{{ FirstName }} {{ LastName }} </br></br>
<table border="1">
    <tr>
        <td>Sub</td>
        <td>Score</td>
    </tr>
    {% for s in Score %}
    <tr>
        <td>"{{ s['sub'] }}"</td>
        <td>"{{ s['score'] }}"</td>
    </tr>
    {% endfor %}
</table>
"""

def render(iterator):
    template = Template(TEMPLATE)
    for chunk in iterator:
        chunk['Score'] = chunk.apply(lambda r:
                                     template.render(**r), axis=1)
        yield chunk

df1 = df.mapInPandas(render, schema="FirstName string, LastName string, Score string")
df1.show()

+---------+--------+--------------------+
|FirstName|LastName|               Score|
+---------+--------+--------------------+
|    Hello|   World|\n<b>FullName:</b...|
|      ABC|     XYZ|\n<b>FullName:</b...|
+---------+--------+--------------------+
yhuiod9q

yhuiod9q2#

溶液:

from jinja2 import Template

def calc(row):
    TEMPLATE = """
    <b>FullName:</b>{{ FirstName }} {{ LastName }} </br></br>
    <table border="1">
        <tr>
            <td>Sub</td>
            <td>Score</td>
        </tr>
        {% for s in Score %}
        <tr>
            <td>{{ s['sub'] }}</td>
            <td>{{ s['marks'] }}</td>
        </tr>
        {% endfor %}
    </table>
    """
    TEMPLATE = TEMPLATE.replace("\n", "")
    template = Template(TEMPLATE)
    return template.render(**row)

df = df.toPandas() # Convert SparkDF to Pandas
df['Score_HTML'] = df.apply(calc, axis=1)

相关问题