map函数需要更快

2exbekwf 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(364)

我正在尝试优化我的代码以在hadoop集群上运行。有人能帮我想办法让这更好吗？我接受了一个非常大的数字集4000多万，每个数字都在一个新的行。当数字读入时，我在数数每一个数字，求所有数字的和，同时检查每一个数字是否是素数。


# !/usr/bin/env python

import sys
import string
import math

total_of_primes = 0
total = 0
count = 0
not_prime = 0
count_string = 'Count:'
total_string = 'Total:'
prime_string = 'Number of Primes:'

for line in sys.stdin:
  try:
    key = int(line)
  except:
    continue
  total = total + key
  count = count + 1
  if key == 2 or key == 3:
    not_prime = not_prime - 1
  elif key%2 == 0 or key%3 == 0:
    not_prime = not_prime + 1
  else:  
    for i in range(5,(int(math.sqrt(key))+1),6):
      if key%i == 0 or key%(i+2) ==0:
        not_prime = not_prime + 1
        break

total_of_primes = count - not_prime  

print '%s\t%s' % (count_string,count)
print '%s\t%s' % (total_string,total)
print '%s\t%s' % (prime_string,total_of_primes)

hadoop mapreduce python optimization primes

来源：https://stackoverflow.com/questions/29443171/map-function-that-needs-to-be-faster

1条答案

按热度按时间

axr492tv1#

我试着把每件事都变成一种理解。理解比原生python代码快，因为它们访问c库。我也省略了考试 2 以及 3 ，因为您可以在完成循环后手动添加这些。
我几乎可以保证，这将有错误，因为我没有你的测试数据和这么大的理解（对我来说，无论如何）真的需要测试。从技术上讲，这是一行，但为了可读性，我试着把它分开。不过，希望它至少能给你一些想法。

biglist = [ # this will be a list of booleans
    not int(line)%2 or # the number is not even
    not int(line)%3 or # not divisible by 3
    (
        not int(line)%i or # not divisible by each item in the range() object
        not int(line)%(i+2) for i in # nor by 2 greater than each item
            # and only go through the range() object while it's still prime
            itertools.takewhile(lambda x: not int(line)%x or not int(line)%(x+2),
        range(5, int(pow(int(line), 0.5))+1, 6)) # pow(x, 0.5) uses a built-in instead of an imported module
    )
for line in  sys.stdin) if line.lstrip('-+').isdigit() # going through each item in sys.stdin

# as long as long as it's a digit. if you only expect positive numbers, you can omit ".lstrip('-+')".

]

total_of_primes = len(biglist) + 2 # manually add 2 and 3 instead of testing it

如果执行时间不够长，可以考虑使用较低级别（写得慢，运行得快）的语言，比如c。

赞(0）回复(0）举报 2021-05-30

我来回答

map函数需要更快

1条答案

相关问题

热门标签

最新问答