Skip to content

Conversation

@Moh-Yakoub
Copy link
Contributor

@Moh-Yakoub Moh-Yakoub commented Apr 5, 2021

Overview

Closes #249

I've noticed that we have a type deduction error in case the field had NaNs as described in the attached issue.
I furthur noticed that we don't check if the string columns can be casted to a float/int type. I've added extra check to see id a string column can be casted to an int/float column and deduce the proper data_type accordingly.

Update (04/10/2021)

  • Adding warning message when NaN columns displayed as histograms
  • Resolve NaN series causing error in histogram execute_binning
  • Rewrote is_numeric_nan_column to run 2x faster
    image

Changes

I've added a logic to

  1. Check if the column can be cast to int/double
  2. apply the respective data_type inference.

Example Output

Screen Shot 2021-04-05 at 11 51 40 PM

The result shows that the two columns mentioned in the issue: # Instances and # Attributes have a correct data_type now

@codecov
Copy link

codecov bot commented Apr 5, 2021

Codecov Report

Merging #343 (eeee236) into master (1a72332) will increase coverage by 0.27%.
The diff coverage is 92.68%.

❗ Current head eeee236 differs from pull request most recent head 02a6a99. Consider uploading reports for the commit 02a6a99 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #343      +/-   ##
==========================================
+ Coverage   79.98%   80.25%   +0.27%     
==========================================
  Files          50       50              
  Lines        3612     3632      +20     
==========================================
+ Hits         2889     2915      +26     
+ Misses        723      717       -6     
Impacted Files Coverage Δ
lux/vislib/altair/Choropleth.py 94.20% <0.00%> (ø)
lux/vislib/matplotlib/ScatterChart.py 76.47% <50.00%> (+1.17%) ⬆️
lux/action/univariate.py 90.90% <100.00%> (+0.52%) ⬆️
lux/executor/PandasExecutor.py 96.07% <100.00%> (+0.06%) ⬆️
lux/utils/utils.py 90.12% <100.00%> (+1.23%) ⬆️
lux/vislib/matplotlib/MatplotlibRenderer.py 87.30% <100.00%> (+0.63%) ⬆️
lux/interestingness/interestingness.py 87.56% <0.00%> (+1.08%) ⬆️
lux/vislib/matplotlib/Heatmap.py 98.33% <0.00%> (+1.66%) ⬆️
lux/vislib/altair/ScatterChart.py 96.96% <0.00%> (+3.03%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a72332...02a6a99. Read the comment docs.

@dorisjlee
Copy link
Member

dorisjlee commented Apr 10, 2021

Thanks @Moh-Yakoub! I made some changes to resolve the related issue in #249 and rewrote the helper function in a more optimized way. Congrats on your first contribution to Lux!

@dorisjlee dorisjlee merged commit bab48ff into lux-org:master Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misdetected data type when numerical column contains null

2 participants