Skip to content

Conversation

@Erotemic
Copy link
Contributor

@Erotemic Erotemic commented Apr 29, 2023

This is an implementation of the idea proposed in #103997.

It intercepts the argument passed to -c, and removes common leading whitespace from each line in the argument.

Given an input string, the algorithm overview is:

  • split the string into lines
  • count the number of leading whitespace characters for each line
  • keep track of the minimum leading spaces, but ignore lines that contain no non-whitespace characters.
  • if number of common whitespace charcters is non-zero, then loop over all lines again and remove that number of leading spaces (again ignoring the lines that are entirely whitespace).
  • rejoin the new lines into a new string and continue the pymain-run-command function with that.

Big thanks to @sunmy2019 who really helped clean this PR up.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@ghost
Copy link

ghost commented Apr 29, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@Erotemic Erotemic force-pushed the dedent_pymain_command branch from 71c6b63 to 51c1320 Compare April 29, 2023 23:20
@Erotemic
Copy link
Contributor Author

I was able to cobble my way through the C docs and write what I think is a reasonable attempt at a pure C dedent function (I forgot how fun -- albiet time consuming --- pointer logic can be).

To highlight some of the corner cases that need to be accounted for, this is the test case I'm using locally:

python -c "
import subprocess
# Use $ to note when a line will have all whitespace
lines = '''
         $

  $
    data = \"\"\"
    
    this data has newlines above and below  $
                            $

    \"\"\"
    if 1:         $
        print(123)$

    print(12345)
    print(repr(data))
'''.replace('$', '')
subprocess.run(['./python', '-c', lines])
"

I still haven't handled tabs, but I think my space logic is correct. I do need help vetting my C code and fixing the memory and security problems with it.

@sunmy2019
Copy link
Member

sunmy2019 commented Apr 30, 2023

Delete all your wchar_t stuff. It is error-prone and unnecessary.

Delete all your _unicode_dedent. It is lengthy. Do utf_8_bytes_dedent is much more concise and simpler.

@sunmy2019
Copy link
Member

You should act fast since 3.12 release window will soon close. No new feature after May 8.
https://peps.python.org/pep-0693/

Can you give me write access to your branch?

@Erotemic Erotemic force-pushed the dedent_pymain_command branch from 06667b7 to 9649590 Compare May 1, 2023 02:08
@Erotemic
Copy link
Contributor Author

Erotemic commented May 1, 2023

@sunmy2019 I gave you access to my cpython fork.

@sunmy2019
Copy link
Member

I got one thought:

textwrap.dedent remove space and tabs.
https://docs.python.org/3/library/textwrap.html#textwrap.dedent

textwrap.dedent("""
 \t\t1
   2""")
'\n\t\t1\n  2'

It requires remembering the exact "space and tab prefix". Should we implement this?

@Erotemic
Copy link
Contributor Author

Erotemic commented May 1, 2023

It requires remembering the exact "space and tab prefix". Should we implement this?

Probably, but off the top of my head I'm not sure what the most elegant way to do it would be. I suppose instead of maintaining the tab and space counts, it would be sufficient to maintain the current shortest whitespace sequence, and then check how much of that sequence new lines match, shortening the sequence as necessary. I can give that a try.

On a different note, I'm sure that new features will want tests associated with them, but I'm not sure where the other "-c" tests live. Do you know where the best spot to put tests would be?

@sunmy2019
Copy link
Member

sunmy2019 commented May 1, 2023

I got one thought:

textwrap.dedent remove space and tabs.

@Erotemic, take a look. I have implemented it with production quality at d336ac7.

Edit your #103998 (comment) to make it clear and concise, so that we can attract more reviewers.

@sunmy2019
Copy link
Member

On a different note, I'm sure that new features will want tests associated with them, but I'm not sure where the other "-c" tests live. Do you know where the best spot to put tests would be?

I guess Lib/test/test_cmd_line.py

@sunmy2019
Copy link
Member

sunmy2019 commented May 1, 2023

We need someone with write access to review this.

@vstinner as the recent maintainer of this file.
@hauntsaninja