Skip to content

test_taar_similarity.test_compute_donors causing test failures #238

@acmiyaguchi

Description

@acmiyaguchi

def test_compute_donors(spark, addon_whitelist, multi_clusters_df):
multi_clusters_df.createOrReplaceTempView("longitudinal")
# Perform the clustering on our test data. We expect
# 3 clusters out of this and 10 donors.
_, donors_df = taar_similarity.get_donors(spark, 3, 10, addon_whitelist, random_seed=42)

________________________________________________ test_compute_donors ________________________________________________

spark = <pyspark.sql.session.SparkSession object at 0x7fa2c3b17f10>
addon_whitelist = ['system-addon-guid', 'var-0-guid-0', 'var-0-guid-1', 'var-0-guid-2', 'var-1-guid-0', 'var-1-guid-1', ...]
multi_clusters_df = DataFrame[client_id: string, normalized_channel: string, geo_city: array<strin...ar_parent_browser_engagement_unique_domains_count: array<struct<value:bigint>>]

    def test_compute_donors(spark, addon_whitelist, multi_clusters_df):
        multi_clusters_df.createOrReplaceTempView("longitudinal")
    
        # Perform the clustering on our test data. We expect
        # 3 clusters out of this and 10 donors.
>       _, donors_df = taar_similarity.get_donors(spark, 3, 10, addon_whitelist, random_seed=42)

tests/test_taar_similarity.py:263: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mozetl/taar/taar_similarity.py:151: in get_donors
    clusters = compute_clusters(addons_df, num_clusters, random_seed)
mozetl/taar/taar_similarity.py:101: in compute_clusters
    model = pipeline.fit(addons_df)
.tox/py27/local/lib/python2.7/site-packages/pyspark/ml/base.py:132: in fit
    return self._fit(dataset)
.tox/py27/local/lib/python2.7/site-packages/pyspark/ml/pipeline.py:109: in _fit
    model = stage.fit(dataset)
.tox/py27/local/lib/python2.7/site-packages/pyspark/ml/base.py:132: in fit
    return self._fit(dataset)
.tox/py27/local/lib/python2.7/site-packages/pyspark/ml/wrapper.py:288: in _fit
    java_model = self._fit_java(dataset)
.tox/py27/local/lib/python2.7/site-packages/pyspark/ml/wrapper.py:285: in _fit_java
    return self._java_obj.fit(dataset._jdf)
.tox/py27/local/lib/python2.7/site-packages/py4j/java_gateway.py:1160: in __call__
    answer, self.gateway_client, self.target_id, self.name)
.tox/py27/local/lib/python2.7/site-packages/pyspark/sql/utils.py:63: in deco
    return f(*a, **kw)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions