If you are getting an exception when running an Azure Data Lakes Analytics U-SQL query against a CSV source file, you may get an inscrutable error like the following:
ERROR
VertexFailedFast. Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][0] with error: Vertex user code error.
DESCRIPTION
Vertex failed with a fail-fast error
RESOLUTION
DETAILS
Vertex SV1_Extract[0][0].v2 {43B20D9E-E63F-48AF-8E9A-FFFAE288FCB8} failed
Error:
Vertex user code error
exitcode=CsExitCode_StillActive Errorsnippet=An error occurred while processing adl://adlmvp.azuredatalakestore.net/Zoiner/ExtentAligned/On_Time_On_Time_Performance_2014_1_OneColumn.csv
In my experience so far, the root causes you can check for that trigger this error include:
- The columns in your U-SQL query don't line up with the number of columns in the CSV. For example, you might be missing a column in your query. Double check you have exactly the right number of columns.
- The data type for a given column in U-SQL cannot be used to parse the string value from the CSV. For example, you might have an int as the data type, but a value in the CSV is a string like "N/A". If all else fails, start with every column having a value of string and then gradually add the correct data types. This will help you ensure your problem is not really a case of (1) above (schema issue).
- Your CSV file has a header row. This is really related to the previous cause. In essence, the CSV header row would have string values for all cells. Naturally if your schema tries to parse that as anything other than a string then it will fail. To resolve this, filter out the header row when querying the CSV:
@r1 = EXTRACT id string, name string, street string, city string, zip string, age string FROM "/temp/CsvWithHdr.csv" USING Extractors.Csv(); @option1_knownheader = SELECT Int32.Parse(id) AS id, name, street, city, zip, string.IsNullOrEmpty(age) ? (Int16?) null: (Int16?) Int16.Parse(age) AS age FROM @r1 WHERE id != "Id"; OUTPUT @option1_knownheader TO "/temp/opt1.csv" USING Outputters.Csv(); @option2_tryparse = SELECT Int32.Parse(id) AS id, name, street, city, zip, String.IsNullOrEmpty(age) ? (Int16?) null : (Int16?) Int16.Parse(age) AS age FROM @r1 WHERE ((Func<string, bool>)(p => { Int32 dummy; return Int32.TryParse(p, out dummy); }))(id); OUTPUT @option2_tryparse TO "/temp/opt2.csv" USING Outputters.Csv();
Kudos go out to Micheal Rys for this excellent example script.
Comments