csv 通过与另一个DataFrame进行比较来筛选DataFrame

cgh8pdjw  于 2023-04-09  发布在  其他
关注(0)|答案(1)|浏览(89)

我对使用Python非常陌生,并且一直在编写一个脚本,该脚本允许我下载csv,清理数据并创建可以正确导入到shopify商店的输出文件。
到目前为止,我已经成功地完成了这一点,生成了一个包含要导入到我的商店中的相关数据的每日导入文件。
为了进一步增强代码,我想做的是对输出文件进行过滤,使其只包括发生变化的行。换句话说,如果产品成本或价格发生变化,或者它不再处于预购状态,我希望它被包括在内,但如果产品列表和最新库存中的项目行相同,则应将其从导入中排除。

更新:过去几天我一直在做这件事,我想我已经接近了,但我仍然无法完成最后一步。

我已经为比较创建了两个DataFrame:df_mergeddf_prod_filtered
我根据故障排除建议的脚本生成了以下内容:

Columns:  ['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']
Index column of df_merged:  None
Number of rows:  1802
Column types:
 ID                            int64
Handle                       object
Variant Compare At Price    float64
Template Suffix              object
Variant Inventory Policy     object
Variant Cost                float64
dtype: object

df_prod_filtered
Columns:  ['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']
Index column of df_prod_filtered:  None
Number of rows:  1802
Column types:
 ID                            int64
Handle                       object
Variant Compare At Price    float64
Template Suffix              object
Variant Inventory Policy     object
Variant Cost                float64
dtype: object

很明显, Dataframe 的结构是相同的,并且具有相同的行数,在移除“Discontinued”项目之后将函数移动到。
我最近的尝试是使用掩码比较两个DataFrame。

# create a mask by comparing the two dataframes based on their index (ID column)
mask = df_merged.eq(df_prod_filtered.loc[df_merged.index])

# create a new column in df_merged that shows if there is a difference or not
df_merged['Diff'] = ~mask.all(axis=1)

但是,这会导致所有行都被标记为相同(任何单元格都没有差异)。
为了确认情况不是这样,我手动编辑了df_prod_filtered中的几个单元格,并重复了这个过程,得到了另一个没有明显差异的列表。
所以这是我再次卡住的地方。我需要比较两个 Dataframe ,现在称为df_mergeddf_prod_filtered,使用ID列作为索引。我需要合并或生成一个新的 Dataframe ,它只包含df_merged中的数据与df_prod_filtered中的数据不同的行,并且它需要从df_merged中获取新的行数据。
例如,如果df_merged具有以下内容:

ID,Handle,Variant Compare At Price,Template Suffix,Variant Inventory Policy,Variant Cost
6548971520023,128194,49.99,,deny,32.49
6548974206999,128356,9.99,,deny,6.49

df_prod_filtered

ID,Handle,Variant Compare At Price,Template Suffix,Variant Inventory Policy,Variant Cost
6548971520023,128194,49.99,,deny,32.49
6548974206999,128356,8.99,,deny,6.29

新 Dataframe 应仅包含

ID,Handle,Variant Compare At Price,Template Suffix,Variant Inventory Policy,Variant Cost
6548974206999,128356,9.99,,deny,6.49

现在考虑到我对Python和Pandas的新手性质,我的问题可能在脚本的其他地方,例如如何生成df_prod_filtereddf_merged。因此,这里是整个脚本,其中包含编辑的链接以供参考:

import pandas as pd
import datetime
import requests
import re
import os

# Set filename as current date
now = datetime.datetime.now()
filename = now.strftime("%B%d") + '.csv'

# Define a function to remove the word 'Brick' from the title
def remove_brick(title):
    if 'Booster' in title and 'Brick' in title:
        return title.replace('Brick', '').strip()
    else:
        return title
        
# Clear existing file        
if os.path.exists('stockfeed.csv'):
    os.remove('stockfeed.csv')
    
# Download todays stockfeed.
url = '<CSV URL Goes HERE>'
response = requests.get(url)

if response.status_code == 200:
    with open('stockfeed.csv', 'wb') as f:
        f.write(response.content)
else:
    print('Failed to download CSV file')

# Load the CSV file into a pandas DataFrame
df_a = pd.read_csv('stockfeed.csv')

# Keep only the required columns
df_a = df_a[['item_number', 'name', 'image_path', 'rrp', 'description', 'barcode', 'manufacturer_sku', 'availability', 'publisher', 'price_ex_gst', 'item_group', 'board_game_genre', 'weight_kg', 'game_family']]

# Rename the columns
df_a = df_a.rename(columns={'item_number': 'Handle', 'name': 'Title','image_path': 'Image Src', 'description': 'Body HTML', 'manufacturer_sku': 'Variant SKU','barcode': 'Variant Barcode', 'weight_kg': 'Variant Weight', 'price_ex_gst': 'Variant Cost', 'publisher': 'Vendor'})

# Filter the rows based on criteria
df_a = df_a[(df_a['availability'] == 'Pre-Order')
            & (~df_a['item_group'].isin(['Board Games', 'CCG', 'Puzzles']))
            & (~df_a['game_family'].isin(['Traveller', 'Keyforge', 'SLA Industries', 'Successors', 'Dungeon Crawl', 'Judge Dredd', 'KULT RPG', 'War of the Ring', 'Rocketmen', 'Great Wyrms of Draka', 'Epic Card Game', '13th Age RPG', 'A Game of Thrones A Song of Ice and Fire', 'A Song of Ice and Fire', 'Achtung Cthulhu', 'Achtung Cthulhu 2d20', 'Achtung! Cthulhu Miniatures', 'Adventures & Academia', 'Alien RPG', 'Ashen Stars RPG', 'Avatar Legends', 'Battletech', 'BeyBlade', 'Bicycle', 'Black Void RPG', 'Blue Rose RPG', 'Boss Monster', 'Call to Adventure', 'Castles and Crusades RPG', 'Conan RPG', 'Corolis RPG', 'Cypher', 'Dark Souls', 'Digimon Card Game', 'Doctor Who', 'Divinity', 'Disney', 'Dungeon Crawl Classics', 'Dungeonology', 'Elder Scrolls Call to Arms', 'Fallout RPG', 'Fallout Wasteland Warfare', 'Fantasy AGE', 'Fear Itself RPG', 'Fire & Stone', 'Folklore', 'Forbidden Lands RPG', 'Forbidden Lands', 'Fragged Empire RPG', 'G.I. Joe', 'GameMastery', 'Galaxy Defenders', 'Gatekeeper Dice', 'Halfsies Dice', 'Hero Realms', 'Heroclix', 'Homeworld Revelations RPG', 'Hunter: The Reckoning', 'Infinity RPG', 'John Carter of Mars RPG', 'Jack Vance RPG', 'Invisible Sun', 'Kem Arrow', 'Kids on Bikes', 'Knights of the Round', 'Kobolds Ate My Baby', 'Lamentations RPG', 'Last Aurora', 'Liminal RPG', 'Lord of the Rings RPG', 'Masks', 'Metamorphosis Alpha', 'Modern Age RPG', 'Mutant City Blues RPG', 'Mork Borg RPG', 'Mutant Crawl Classics', 'Mutant Year Zero RPG', 'Mutants & Masterminds', 'My Little Pony', 'Mythos RPG', 'Nerf', 'Night\'s Black Agents RPG', 'Numenera', 'Odyssey of the Dragonlords RPG', 'One Piece', 'Original Adventures Reincarnated', 'Overlight', 'Paladins of the Western Kingdoms', 'Paradox Initiative', 'Pasion de las Pasiones', 'Pirate Borg', 'Planegea RPG', 'PolyHero', 'Power Rangers', 'Robin Laws RPG', 'Rocketmen', 'Ruins of Symbaroum', 'Ruins of Symbaroum RPG', 'Shadowrun', 'Shadows of Brimstone', 'Sorcerer', 'Sorcerer\'s Arena', 'Spirograph', 'Sprue Wave 2', 'Star Realms', 'Star Trek Adventures', 'Star Wars X Wing', 'Starfinder', 'Stargate SG-1 RPG', 'Symbaroum RPG', 'Symbaroum', 'Tales from the Loop RPG', 'Symbaroum RPG - Thistle Hold', 'Tally Ho', 'The Art of', 'The Borellus', 'The Esoterrorists RPG', 'The Excellents RPG', 'The Fantasy Trip', 'The Lost Citadel RPG', 'The Spy Game RPG', 'The One Ring RPG', 'The Strange', 'The Yellow King RPG', 'Things from the Flood RPG', 'Trail of Cthulhu RPG', 'TimeWatch RPG', 'Transformers', 'Tripods & Triplanes', 'Twilight 2000', 'U-Boot', 'Upzone', 'Vaesen Nordic Horror', 'Vornheim RPG The Complete City', 'Vurt RPG', 'Warhammer Fantasy Roleplay', 'World of Tanks', 'World War Cthulhu', 'Yggdrasil']))
            & (~df_a['Image Src'].isin(['https://letsplaygames.com.au/media//catalog/product/placeholder/default/Placeholder_Image-_LPG_Transparent.png']))
            & (~df_a['Title'].str.contains('Coriolis|Power Rangers|Starfinder|Counter|Dice Cups|Dice Bag|Marvel|Homeworld Revelations|LUGU|Card Game|Essence20 Roleplaying System|Infinity Collectible|Class Deck|Castle Falkenstein|Transformers|LPG|Stand with Paints|G.I. Joe|Haunted West|Playing Cards|Metal Coasters|Teenagers From Outer Space|Mekton Zeta|Core Fuzion|Display|Poster|Everyday Heroes RPG', case=False))
            & (~df_a['Vendor'].isin(['Wizards of the Coast', 'Arcane Tinmen', 'Steve Jackson Games', 'Ultra Pro', 'Akora Cards', 'Word Forge Games', 'Ultimate Guard', 'The Op', 'Studio 9 Games', 'Steamforged Games', 'Rebellion Unplugged', 'Plaid Hat Games', 'Pinfinity', 'Pelgrane Press', 'Monte Cook Games', 'Lynnvander Studios', 'Mantic Games', 'Loke BattleMats', 'Lamentations of the Flame', 'Goodman Games', 'Green Ronin Publishing', 'Funko', 'Gamelyn Games', 'Fantasy Flight Games', 'Edge Studios', 'Darrington Press', 'CMON', 'Chaosium', 'Battle Systems', 'Black Site Studios', 'Bandai', 'Atomic Overmind Press', 'Atomic Mass Games', 'Atlas Games', 'Archon Studio', 'Archon Games', '9th Level Games', 'Modiphius Entertainment', 'Troll Lord Games', '','Two Little Mice', 'Dark Horse Books']))]
df_a = df_a[~((df_a['Vendor'].isna()) & (df_a['Variant SKU'].str.contains('BPG0')))]

# Generate New Columns
df_a['Variant Compare At Price'] = round(df_a['rrp'] / 1.1, 4)
df_a['Variant Price'] = df_a['Variant Compare At Price']
df_a['Template Suffix'] = df_a['availability'].apply(lambda x: 'pre-order' if x == 'Pre-Order' else '')
df_a['Variant Inventory Policy'] = df_a['availability'].apply(lambda x: 'continue' if x == 'Pre-Order' else 'deny')
df_a['Type'] = ''
df_a['Tags'] = ''
df_a['Variant Weight Unit'] = 'kg'
df_a['Variant Taxable'] = 'TRUE'
df_a['Variant Requires Shipping'] = 'TRUE'
df_a['Variant Inventory Tracker'] = 'shopify'

# Set Product Keywords
rpgkey = ['Sourcebook', 'Kobold Press', 'Adventure Path', 'Campaign Setting', 'Pathfinder First Edition', 'Pathfinder Second Edition', 'Cyberpunk 2020', 'Cyberpunk RED','Vampire: The Masquarade']
paintkey = ['AK Interactive Auxiliaries', 'Primers', 'Metallics', 'Tones', 'Primer', 'Paint Thinner', 'Acrylic', 'Acrylics', 'Paint Stripper', 'Paint Set', 'Speedpaint', '3Gen Sets', 'Dual Exo Set', 'Pigments', 'Aerosol', 'Colour Set', 'Color Set']
miniaturekey = ['D&D Classic Collection', 'Icons of the Realms', 'Action Figure', 'Miniatures','Idols of the Realms', 'Miniature', 'Pathfinder Battles', 'D&D Frameworks', 'TinkerTurf', 'Wizkids Dungeon Dressings', 'Deep Cuts']
colectablekey = ['Medallion', 'Life-Sized', 'Replicas of the Realms', 'Trophy', 'Plaque', 'Dicelings', 'D&D Cartoon Classics', 'Collectibles','Collectible', 'Collectables', 'Collectable', 'Statue', 'Plush']
accessorykey = ['Mat', 'Playmat', 'Paintbrush', 'Brush', 'Tool', 'Tray', 'Palette', 'Glue', 'Battlemat', 'Megamat', 'Brush Set', 'Insert', 'Drybrush', 'Folio', 'Pathfinder Accessories', 'Flip-Mat', 'Dry Brushes', 'Marker', 'Sheets', 'Brushes Set'] 
dicekey = ['7-Die', 'D20', 'D12', 'D10', 'D100', 'D4', 'd6', 'Dice Set', 'Dice'] 
pbkey = ['Brush', 'PaintBrush']
monkey = ['Monument Hobbies']
mlkey = ['ml']

# Set Product Types 
df_a.loc[df_a['Title'].str.contains('|'.join([rf'\b{re.escape(kw)}\b' for kw in paintkey]), case=False), 'Type'] = 'Paint'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'\b{re.escape(kw)}\b' for kw in rpgkey]), case=False), 'Type'] = 'RPG Book'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'\b{re.escape(kw)}\b' for kw in miniaturekey]), case=False), 'Type'] = 'Miniature'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'\b{re.escape(kw)}\b' for kw in colectablekey]), case=False), 'Type'] = 'Collectable'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'\b{re.escape(kw)}\b' for kw in accessorykey]), case=False), 'Type'] = 'Accessory'
df_a.loc[df_a['Title'].str.contains('|'.join([rf'\b{re.escape(kw)}\b' for kw in dicekey]), case=False), 'Type'] = 'Dice'
df_a.loc[(df_a['Title'].str.contains('Critical Role')) & (df_a['Title'].str.contains('Boxed Set')), 'Type'] = 'Miniature'
df_a.loc[df_a['Vendor'] == 'Monument Hobbies', 'Type'] = 'Accessory'
df_a.loc[df_a['Title'].str.contains('|'.join(mlkey), case=False), 'Type'] = 'Paint'

# Remove invalid product types. 
df_a = df_a.dropna(subset=['Type'])

#Define paint colors
paint_colors =  ['Black', 'Blue', 'Brown', 'Green', 'Grey', 'Orange', 'Purple', 'Red', 'Yellow', 'Wood', 'Marble', 'Ashes', 'Dirt', 'Earth', 'Soil', 'Dust', 'Flesh', 'Topaz', 'Rust', 'Acid', 'Magic', 'Arcane', 'Frost', 'Blood', 'Moss', 'Varnish', 'Wash', 'Thinner', 'Satin', 'Umber', 'Sepia', 'Violet', 'Ink', 'Fluorescent', 'Magenta', 'White', 'Skin', 'Brass', 'Gold', 'Bronze', 'Copper', 'Mystic', 'Gloomy', 'Deep', 'Martian', 'Nuclear', 'Imperial', 'Plague', 'Space', 'Lotus', 'Cardinal', 'Velvet', 'Plasma', 'Magma', 'Lava', 'Metal', 'Metallic', 'Stone', 'Neutral', 'Concrete', 'Glacier', 'Glue', 'Crystal', 'Antishine', 'Enamel']

# define functions to split tags and generate new tags column
def split_tags(tags):
    return [tag.strip() for tag in re.findall(r'"[^"]+"|\w+', tags)]
def generate_tags(row):
    tags = []
    if row['Type'] == 'Paint' and ('Set' in row['Title'] or 'Sets' in row['Title'] or 'Briefcase' in row['Title'] or 'Case' in row['Title']):
        tags.append('Paint Set')
    if row['Type'] == 'Miniature' and 'Booster' in row['Title']:
        tags.append('Booster')
    if row['Type'] == 'Miniature' and 'Icons of the Realm' in row['Title']:
        tags.append('Painted')
        tags.append('Icons of the Realm')
    if 'D&D' in row['Title'] or 'Dungeons & Dragons' in row['Title']:
        tags.append('D&D')
    if 'Aerosol' in row ['Title'] or 'Spray' in row ['Title']:
        tags.append('Aerosol')
    if 'Critical Role' in row['Title']:
        tags.append('Critical Role')   
    if 'Cyberpunk RED' in row['Title']:
        tags.append('Cyberpunk RED')  
    if 'Cyberpunk 2020' in row['Title']:
        tags.append('Cyberpunk 2020')  
    if 'Cyberpunk' in row['Title']:
        tags.append('Cyberpunk')          
    if 'Honor Among Thieves' in row['Title']:
        tags.append('Honor Among Thieves')  
    if 'Painted' in row['Title']:
        tags.append('Painted')  
    if 'Unpainted' in row['Title'] or 'Collectors Series' in row ['Title'] or 'Nolzurs Marvelous Miniatures' in row ['Title'] or 'Deep Cuts' in row ['Title']:
        tags.append('Unpainted')  
    if 'Vampire: The Masquerade' in row['Title']:
        tags.append('VTM')  
    if 'Nolzurs Marvelous' in row['Title']:
        tags.append('Nolzurs Marvelous')  
    if 'Pre-Order' in row['availability']:
        tags.append('Pre-Order')  
    if 'd6' in row['Title'] and 'Block' in row ['Title']:
        tags.append('D6 Set')
    if 'Dice Set 7' in row['Title'] or '7-Die Set' in row ['Title']:
        tags.append('RPG Set')
    if 'battlemat' in row['Title'] or 'playmat' in row['Title'] or'megamat' in row['Title'] or'flip mat' in row['Title'] or' mat' in row['Title'] or 'Flip-mat' in row['Title'] or 'Hobby Mat' in row['Title'] or 'Hobby Mat' in row['Title']:
        tags.append('Pre-Order')
    if 'Drybrush' in row['Title'] or 'Paintbrush' in row['Title']:
        tags.append('Paint Brush')         
    if 'Brush Set' in row['Title'] or 'Brushes' in row['Title']:
        tags.append('Paint Brush')
        tags.append('Brush Set')
    if 'Pathfinder' in row['Title']:
        tags.append('Pathfinder')
    if 'Warhammer' in row['Title']:
        tags.append('Warhammer')
    if 'Idols of the Realm' in row['Title']:
        tags.append('2D')
        tags.append('Idols of the Realm')
    if 'Replicas of the Realm' in row['Title']:
        tags.append('Replicas of the Realm')
        tags.append('Collectable')
    if 'Exandria' in row['Title']:
        tags.append('Exandria')
    if 'Dungeon Dressings' in row['Title']:
        tags.append('Dungeon Dressings')
    if 'Portal 2' in row['Title']:
        tags.append('Portal 2')
    if row['Type'] == 'Paint':
        # Check for color in title
        for color in paint_colors:
            if color in row['Title']:
                tags.append(color)
        # Check for color in description
        for color in paint_colors:
            if color in row['Body HTML']:
                tags.append(color)
    return ",".join(tags)

df_a['Tags'] = df_a.apply(generate_tags, axis=1)

# Remove unwanted columns
df_a = df_a.drop(['rrp', 'availability'], axis=1)

# Drop rows with missing values in the "Type" column
df_a = df_a.dropna(subset=['Type'])

# Import the new CSV as df_b
df_b = pd.read_csv('Products.csv')

# Select only the 'Handle' column from df_b
df_b = df_b[['Handle']]

# Convert Handle column to string
df_a['Handle'] = df_a['Handle'].astype(str)
df_b['Handle'] = df_b['Handle'].astype(str)

# Add "-single" to the handle for rows with "Booster" or "Brick" in the title
mask = df_a['Title'].str.contains('Booster Brick')
mask &= ~df_a['Handle'].str.endswith('-single') # Exclude rows with '-single' already present in the Handle
df_a.loc[mask, 'Handle'] = df_a.loc[mask, 'Handle'].astype(str) + '-single'

# Divide Prices of Boosters to single amounts. 
df_a.loc[df_a['Title'].str.contains('Booster|Brick'), ['Variant Price', 'Variant Compare At Price', 'Variant Cost']] /= 8

# Filter out rows where 'Handle' value is already in df_b
df_a = df_a[~df_a['Handle'].isin(df_b['Handle'])]

# Reorder the columns
df_a = df_a[['Handle','Title','Body HTML','Vendor','Type','Tags','Template Suffix','Variant SKU','Variant Barcode','Variant Weight','Variant Weight Unit','Variant Price','Variant Compare At Price','Variant Taxable','Variant Inventory Policy','Variant Requires Shipping','Variant Cost','Image Src']]

#Export New Products List
df_a.to_csv('NewProducts' + filename, index=False)

# Creating the Update Product List
# Load the CSV files into a pandas DataFrame
df_a = pd.read_csv('stockfeed.csv')
df_b = pd.read_csv('Products.csv')[['ID', 'Handle']]
df_prod = pd.read_csv('Products.csv')[['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]

# Keep only the required columns
df_a = df_a[['item_number', 'rrp', 'availability', 'price_ex_gst']]

# Rename the columns
df_a = df_a.rename(columns={'item_number': 'Handle', 'price_ex_gst': 'Variant Cost'})

# Create new columns
df_a['Variant Compare At Price'] = round(df_a['rrp'] / 1.1, 2)
df_a['Template Suffix'] = df_a['availability'].apply(lambda x: 'pre-order' if x == 'Pre-Order' else '')
df_a['Variant Inventory Policy'] = df_a['availability'].apply(lambda x: 'continue' if x == 'Pre-Order' else 'deny')

# Remove unwanted columns
df_a = df_a.drop(['rrp', 'availability'], axis=1)

# Reorder the columns
df_a = df_a[['Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]

# Convert Handle column in CSV A to string
df_a['Handle'] = df_a['Handle'].astype(str)

# Merge CSV files on Handle column
df_merged = pd.merge(df_a, df_b, on='Handle')

# Reorder the columns
df_merged = df_merged[['ID', 'Handle', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]

# Round Variant Compare At Price to 2 decimal places
df_merged['Variant Compare At Price'] = df_merged['Variant Compare At Price'].round(2)

# Export merged CSV file
df_merged.to_csv('ChangedProducts' + filename, index=False)

# Read in the Handle columns from df_b and df_merged
df_c = df_b[['Handle']]
df_d = df_merged[['Handle']]

# Identify discontinued Handles
df_discon = df_c[~df_c['Handle'].isin(df_d['Handle'])]

# Export discontinued Handles to CSV
df_discon.to_csv('UnmatchedProducts' + filename, index=False)

# start filter test 

# create a list of the handles in df_discon
discon_handles = df_discon['Handle'].tolist()

# remove rows from df_prod that have handles in discon_handles
df_prod_filtered = df_prod[~df_prod['Handle'].isin(discon_handles)]
df_prod_filtered = df_prod_filtered.reset_index(drop=True)

# Print column headings, number of rows, and column types for df_merged
print("df_merged")
print("Columns: ", df_merged.columns.tolist())
print("Index column of df_merged: ", df_merged.index.name)
print("Number of rows: ", len(df_merged))
print("Column types: \n", df_merged.dtypes)

# Print column headings, number of rows, and column types for df_prod_filtered
print("\ndf_prod_filtered")
print("Columns: ", df_prod_filtered.columns.tolist())
print("Index column of df_prod_filtered: ", df_prod_filtered.index.name)
print("Number of rows: ", len(df_prod_filtered))
print("Column types: \n", df_prod_filtered.dtypes)

# create a mask by comparing the two dataframes based on their index (ID column)
mask = df_merged.eq(df_prod_filtered.loc[df_merged.index])

# create a new column in df_merged that shows if there is a difference or not
df_merged['Diff'] = ~mask.all(axis=1)
#print(df_merged)

# Print only rows where Diff is False and a string
#print(df_merged[df_merged["Diff"] == 'False'])

# Print only rows where Diff is False as a boolean
##df_merged = df_merged[df_merged["Diff"] != True]

#print(df_merged)

# end Filter test
ttvkxqim

ttvkxqim1#

好的!我想通了这一点,事实证明,可能我已经尝试过的许多方法可能已经解决了这个问题。
我遇到的问题,没有意识到,是Pandas认为NaN值不同于其他NaN值。
通过将NaN值替换为Placeholder字符串,我能够仅使用更新的条目创建dataframe,然后再次使用numpy将Placeholder替换为NaN值。
我还没有将新代码插入到整个脚本中,但是这个问题的具体目标/问题已经解决了,作为参考,下面是我在测试和解决这个问题时创建的脚本:

import pandas as pd
import numpy as np

# Load the UpdateProducts and base product list dataframes
df_upd = pd.read_csv('ChangedProducts.csv')
df_prod = pd.read_csv('Products.csv')

df_prod = df_prod[['ID', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]
df_upd = df_upd[['ID', 'Variant Compare At Price', 'Template Suffix', 'Variant Inventory Policy', 'Variant Cost']]

# set index to ID column
df_prod.set_index('ID', inplace=True)
df_upd.set_index('ID', inplace=True)

# replace nan values with placeholder
df_upd = df_upd.fillna('EMPTYCELLPLACEHOLDER')
df_prod = df_prod.fillna('EMPTYCELLPLACEHOLDER')

# compare the two dataframes element-wise
comparison = df_upd.eq(df_prod)

# get the rows where all values are True
all_same_rows = comparison.all(axis=1)

# get only the rows that are different
different_rows = df_upd[~all_same_rows]

# replace the placeholder
different_rows = different_rows.replace('EMPTYCELLPLACEHOLDER', np.nan)

# save the different rows to a new CSV file
different_rows.to_csv('DifferentRows.csv', index=True)

相关问题